subject:"\[jira\] \[Commented\] \(HBASE\-17125\) Inconsistent result when use filter to read data"

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131125#comment-16131125
 ] 

Hudson commented on HBASE-17125:


FAILURE: Integrated in Jenkins build HBASE-14070.HLC #233 (See 
[https://builds.apache.org/job/HBASE-14070.HLC/233/])
HBASE-17125 Inconsistent result when use filter to read data (zghao: rev 
4dd24c52b84c74a477e00ab6177d081c29462dd8)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Get.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/UserScanQueryMatcher.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Query.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanWildcardColumnTracker.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java


> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0, 3.0.0
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, 
> HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-11 Thread Chia-Ping Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123164#comment-16123164
 ] 

Chia-Ping Tsai commented on HBASE-17125:


+1

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0, 3.0.0
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, 
> HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123083#comment-16123083
 ] 

Hudson commented on HBASE-17125:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3511 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3511/])
HBASE-17125 Inconsistent result when use filter to read data (zghao: rev 
4dd24c52b84c74a477e00ab6177d081c29462dd8)
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Get.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Query.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanWildcardColumnTracker.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/UserScanQueryMatcher.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java


> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0, 3.0.0
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, 
> HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122891#comment-16122891
 ] 

Hudson commented on HBASE-17125:


FAILURE: Integrated in Jenkins build HBase-2.0 #308 (See 
[https://builds.apache.org/job/HBase-2.0/308/])
HBASE-17125 Inconsistent result when use filter to read data (zghao: rev 
8197a31bbc4c49b4edfc2a0f01b3ef29b40e268d)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Query.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Get.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/UserScanQueryMatcher.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanWildcardColumnTracker.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0, 3.0.0
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, 
> HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-10 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122841#comment-16122841
 ] 

Guanghao Zhang commented on HBASE-17125:


Pushed to master and branch-2.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0, 3.0.0
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, 
> HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-10 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122653#comment-16122653
 ] 

Guanghao Zhang commented on HBASE-17125:


Thanks all for reviewing. Will commit it later if no other objections.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, 
> HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121908#comment-16121908
 ] 

Hadoop QA commented on HBASE-17125:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
37s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
32m 48s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
55s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}156m 
21s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}216m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:bdc94b1 |
| JIRA Issue | HBASE-17125 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881209/HBASE-17125.master.022.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux a96bed34faa6 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 6246523 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/8015/testReport/ |
| modules | C: hbase-client hbase-server U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/8015/console |
| Powered by | Apache Yetus 0.4.0   http://yetus.apache.org |


This message was automatically generated.



> Inconsistent result when

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-07 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117703#comment-16117703
 ] 

Guanghao Zhang commented on HBASE-17125:


Hadoop QA passed. [~anoop.hbase] [~Apache9] [~tedyu] [~chia7712] Any more 
concerns?

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, 
> HBASE-17125.master.021.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116334#comment-16116334
 ] 

Hadoop QA commented on HBASE-17125:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
32m 32s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
47s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}112m 
21s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}168m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:bdc94b1 |
| JIRA Issue | HBASE-17125 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12880597/HBASE-17125.master.021.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 759ed675f525 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / fd76eb3 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7958/testReport/ |
| modules | C: hbase-client hbase-server U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7958/console |
| Powered by | Apache Yetus 0.4.0   http://yetus.apache.org |


This message was automatically generated.



> Inconsistent result when use filter

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116073#comment-16116073
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
30m  8s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
18s{color} | {color:red} hbase-client generated 5 new + 0 unchanged - 0 fixed = 
5 total (was 0) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
40s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 17s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}162m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | 
org.apache.hadoop.hbase.master.procedure.TestDisableTableProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestModifyTableProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestCreateTableProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestEnableTableProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure |
|   | org.apache.hadoop.hbase.master.procedure.TestDeleteTableProcedure |
|   | org.apache.hadoop.hbase.master.TestGetLastFlushedSequenceId |
|   | org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer2 |
|   | org.apache.hadoop.hbase.master.TestAssignmentManagerMetrics |
|   | org.apache.hadoop.hbase.master.TestGetInfoPort |
|   | org.apache.hadoop.hbase.master.TestMasterFailoverBalancerPersistence |
|   | org.apache.hadoop.hbase.master.cleaner.TestSnapshotFromMaster |
|   | 
org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster |
|   | org.apache.hadoop.hbase.master.TestTableStateManager

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-03 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113788#comment-16113788
 ] 

Guanghao Zhang commented on HBASE-17125:


Ping [~anoop.hbase] [~Apache9] for reviewing.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-03 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113785#comment-16113785
 ] 

Guanghao Zhang commented on HBASE-17125:


bq.  testXXXWithFilterHint and testXXXWithFilter should be removed because they 
can't reproduce the bug of top (cell) change
If no objections about 020 patch, I will remove them when prepare a final 
patch. Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-03 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113784#comment-16113784
 ] 

Guanghao Zhang commented on HBASE-17125:


[~tedyu] Ok. I thought the slack idea is not easy to understand... Any 
concerns about the 020 patch?

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-03 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113780#comment-16113780
 ] 

Guanghao Zhang commented on HBASE-17125:


The failed ut is not related. And it was tracked by HBASE-18425.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-03 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113458#comment-16113458
 ] 

Ted Yu commented on HBASE-17125:


Please see my comment on Jun 26th.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113410#comment-16113410
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
25s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
34m 23s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
37s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
18s{color} | {color:red} hbase-client generated 5 new + 0 unchanged - 0 fixed = 
5 total (was 0) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
49s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}145m 27s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 1s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}206m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.master.TestMasterFailover |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:bdc94b1 |
| JIRA Issue | HBASE-17125 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12880203/HBASE-17125.master.020.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 088eafb945e5 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / fe890b7 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC3 |
| javadoc | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7913/artifact/patchprocess/diff-javadoc-javadoc-hbase-client.txt
 |
| unit |

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-03 Thread Chia-Ping Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112885#comment-16112885
 ] 

Chia-Ping Tsai commented on HBASE-17125:


As mentioned in HBASE-18295,
{quote}
The bugs caused by filter may be resolved by HBASE-17125 because the patch make 
matcher check the version before asking filter. If the SEEK_NEXT_COLUMN is 
returned, the filter.filterKeyValue isn't evaluated. Maybe we should push the 
HBASE-17125...FYI Guanghao Zhang
{quote}
>From my perspective, *testXXXWithFilterHint* and *testXXXWithFilter* should be 
>removed because they can't reproduce the bug of top (cell) change.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-03 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112478#comment-16112478
 ] 

Guanghao Zhang commented on HBASE-17125:


Checked the failed ut. It was related with HBASE-18295. After this patch, we 
will check versions first then check filter. So after checkVersions return 
SEEK_NEXT_COLUMN, the filter.filterKeyValue isn't evaluated. When the 
currentSize is 2, then the "return ReturnCode.NEXT_ROW;" is not be called. And 
it will be called when we want to scan next row, so the ut failed.. 
[~chia7712] I changed the ut to make sure it will be called in one row. Please 
help to review the 020 patch. Thanks.
{code}
  public Filter.ReturnCode filterKeyValue(Cell v) throws IOException {
if (timeToGoNextRow.get()) {
  timeToGoNextRow.set(false);
  return ReturnCode.NEXT_ROW;
} else {
  return ReturnCode.INCLUDE;
}
  }
{code}

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108697#comment-16108697
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
36m 25s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
59s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
23s{color} | {color:red} hbase-client generated 5 new + 0 unchanged - 0 fixed = 
5 total (was 0) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
53s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 42s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}189m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestStore |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:bdc94b1 |
| JIRA Issue | HBASE-17125 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12879785/HBASE-17125.master.019.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 69efc5ab0096 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / a5db120 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC3 |
| javadoc | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7869/artifact/patchprocess/diff-javadoc-javadoc-hbase-client.txt
 |
| unit |

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-08-01 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108495#comment-16108495
 ] 

Guanghao Zhang commented on HBASE-17125:


Attach a 019 patch only change in SQM. Ping [~anoop.hbase] [~Apache9] for 
reviewing.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-07-31 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106879#comment-16106879
 ] 

Anoop Sam John commented on HBASE-17125:


What is the next step here? Any work based on Duo's latest suggestion?  Just 
asking as I remembered abt this issue.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063058#comment-16063058
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
1s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
47s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
24s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 8s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
34m 24s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
45s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s 
{color} | {color:red} hbase-client generated 5 new + 1 unchanged - 1 fixed = 6 
total (was 2) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 44s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 118m 33s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
57s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 181m 7s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.io.asyncfs.TestSaslFanOutOneBlockAsyncDFSOutput |
| Timed out junit tests | 
org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer2 |
|   | org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer |
|   | org.apache.hadoop.hbase.master.TestDistributedLogSplitting |
|   | org.apache.hadoop.hbase.snapshot.TestExportSnapshotNoCluster |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874472/HBASE-17125.master.018.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-26 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063030#comment-16063030
 ] 

Ted Yu commented on HBASE-17125:


bq. The ut is TestFromClientSide#testReadWithFilter and 
TestHRegion#testGetWithFilter

I plugged in these two tests into 17125-slack-13.txt.
They passed.

Please double check.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062900#comment-16062900
 ] 

Duo Zhang commented on HBASE-17125:
---

Can we avoid changing the ColumnTracker interface? If you all really hate 
modifying the filter of SQM, then my bottom line is to implement the logic of 
FilterAwareColumnTracker inside SQM directly.

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-26 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062871#comment-16062871
 ] 

Guanghao Zhang commented on HBASE-17125:


Attach a 018 patch.
bq. The following would allow the subtest to run alone:
Fixed it.
bq. How about calling this wrapper FilterAwareColumnTracker (or something like 
that) ?
Refactored.
bq. adding more tests on diff scenarios.
The ut is TestFromClientSide#testReadWithFilter and 
TestHRegion#testGetWithFilter.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.018.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-26 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062831#comment-16062831
 ] 

Anoop Sam John commented on HBASE-17125:


"FilterAwareColumnTracker "  -  Sounds good.
Ya +1 on adding more tests on diff scenarios. Thanks

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-25 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062382#comment-16062382
 ] 

Ted Yu commented on HBASE-17125:


Since existing unit tests didn't uncover the defect in 17125-slack-13.txt, 
please add new test to show that patch v17 is correct in this regard.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-24 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062150#comment-16062150
 ] 

Ted Yu commented on HBASE-17125:


{code}
+final WAL wal = 
HBaseTestingUtility.createWal(TEST_UTIL.getConfiguration(), logDir, info);
+this.region = TEST_UTIL.createLocalHRegion(info, htd, wal);
{code}
The above code relies on other test to initialize chunk creator.
If you run the subtest alone, you would observe NPE like the following:
{code}
MemStoreLABImpl.getOrMakeChunk() line: 242
MemStoreLABImpl.copyCellInto(Cell) line: 118
MutableSegment(Segment).maybeCloneWithAllocator(Cell) line: 168
CompactingMemStore(AbstractMemStore).maybeCloneWithAllocator(Cell) line: 268
CompactingMemStore(AbstractMemStore).add(Cell, MemstoreSize) line: 107
CompactingMemStore(AbstractMemStore).add(Iterable, MemstoreSize) line: 101
HStore.add(Iterable, MemstoreSize) line: 711
HRegion.applyToMemstore(Store, List, boolean, MemstoreSize) line: 4001
HRegion.applyFamilyMapToMemstore(Map, MemstoreSize) line: 
3984
HRegion.doMiniBatchMutate(BatchOperation) line: 3439
HRegion.batchMutate(BatchOperation) line: 3131
HRegion.batchMutate(Mutation[], long, long) line: 3073
HRegion.batchMutate(Mutation[]) line: 3077
HRegion.doBatchMutate(Mutation) line: 3827
HRegion.put(Put) line: 2950
TestHRegion.testGetWithFilter() line: 2665
{code}
The following would allow the subtest to run alone:
{code}
+ChunkCreator.initialize(MemStoreLABImpl.CHUNK_SIZE_DEFAULT, false, 0, 0, 
0, null);
+this.region = TEST_UTIL.createLocalHRegion(info, htd, wal);
{code}

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-24 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062138#comment-16062138
 ] 

Ted Yu commented on HBASE-17125:


{code}
+public class ColumnTrackerWrapper implements ColumnTracker {
{code}
There may be more wrapper for ColumnTracker in the future.
How about calling this wrapper ColumnTrackerWithVersions (or something like 
that) ?

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-24 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062069#comment-16062069
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
36s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
28s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
44s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 50s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 5m 54s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
58m 43s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 2s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 33s 
{color} | {color:red} hbase-client generated 5 new + 1 unchanged - 0 fixed = 6 
total (was 1) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 41s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 33m 54s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 134m 51s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.master.locking.TestLockProcedure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874373/HBASE-17125.master.017.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 4c11bb685b75 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 96aca6b |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs |

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061779#comment-16061779
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 36s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
50s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
1s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 56s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 52s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 31s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 5s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
58m 15s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 
47s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 33s 
{color} | {color:red} hbase-client generated 5 new + 1 unchanged - 1 fixed = 6 
total (was 2) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 40s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 36m 12s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
34s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 142m 30s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.filter.TestFilter |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874342/HBASE-17125.master.016.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 84ee331aa840 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality |

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-23 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061726#comment-16061726
 ] 

Ted Yu commented on HBASE-17125:


Based on patch v16:
{code}
testColumnPaginationFilter(org.apache.hadoop.hbase.filter.TestFilter)  Time 
elapsed: 0.165 sec  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.hadoop.hbase.filter.TestFilter.verifyScan(TestFilter.java:1667)
at 
org.apache.hadoop.hbase.filter.TestFilter.testColumnPaginationFilter(TestFilter.java:1945)
...
testGet_Basic(org.apache.hadoop.hbase.regionserver.TestHRegion)  Time elapsed: 
0.099 sec  <<< FAILURE!
java.lang.AssertionError
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testGet_Basic(TestHRegion.java:2616)
{code}
Can you double check ?

Thanks

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, 
> HBASE-17125.master.016.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-23 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060614#comment-16060614
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 32m 38s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 38s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
21s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 20s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 
49s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
31s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 36s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 47s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 31s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 47s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
6s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
69m 6s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 
57s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 52s 
{color} | {color:red} hbase-client generated 8 new + 1 unchanged - 1 fixed = 9 
total (was 2) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 48s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 46m 35s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
40s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 207m 48s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.filter.TestFilter |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874200/HBASE-17125.master.014.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux b7c5a776da35 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality |

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-23 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060456#comment-16060456
 ] 

Anoop Sam John commented on HBASE-17125:


So this patch is in line with some old proposal that u put right.. I like it..  
Doing some more careful reviews. Pls put the latest patch in RB

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-22 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060430#comment-16060430
 ] 

Anoop Sam John commented on HBASE-17125:


Reviewing HBASE-17125.master.014.patch now

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.014.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060309#comment-16060309
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 3s 
{color} | {color:blue} The patch file was not named according to hbase's naming 
conventions. Please see 
https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for 
instructions. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
35s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
11s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
36s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 57s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 54s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 58s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
6s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
29m 2s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
44s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 17s 
{color} | {color:red} hbase-client generated 2 new + 1 unchanged - 1 fixed = 3 
total (was 2) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 24s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 117m 5s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
41s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 171m 11s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestMasterProcedureWalLease |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874152/17125-slack-13.txt |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-22 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060247#comment-16060247
 ] 

Ted Yu commented on HBASE-17125:


Review board:
https://reviews.apache.org/r/60381/

This is just an alternate approach. Guanghao is the owner of this JIRA.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: 17125-slack-13.txt, example.diff, 
> HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, 
> HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, 
> HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, 
> HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-22 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060137#comment-16060137
 ] 

Guanghao Zhang commented on HBASE-17125:


[~tedyu] your "slack" idea is easy to have bug and has side effects. I try it 
in HBASE-17125.master.no-specified-filter.patch. 

bq. I have run test suite with updated aggregate patch which passed
As i said in email, "ut passed" doesn't mean that your patch is right... You 
send me a v11 patch("ut passed"), I pointed the bug in it. Then you fix it. But 
the latest patch which you send me is still wrong.. Please upload your 
patch here and review board. And let more people to review to make it right. 
Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.012.patch, 
> HBASE-17125.master.013.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-22 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059450#comment-16059450
 ] 

Ted Yu commented on HBASE-17125:


My aggregate patch v11 has defect in that it doesn't handle 
ExplicitColumnTracker#reset() correctly.

ColumnCount should have slack and retract() as well. 
ExplicitColumnTracker#retract() delegates to ColumnCount#retract().
When ExplicitColumnTracker#reset() is called, it resets the slack in each 
ColumnCount to the initial value.

I have run test suite with updated aggregate patch which passed (other than the 
tests flagged by flaky test dashboard).

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.012.patch, 
> HBASE-17125.master.013.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059123#comment-16059123
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 47s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 32s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
16s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
53s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 1s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 44s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
1s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
31m 16s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 26s 
{color} | {color:red} hbase-server generated 1 new + 10 unchanged - 0 fixed = 
11 total (was 10) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 20s 
{color} | {color:red} hbase-client generated 8 new + 1 unchanged - 0 fixed = 9 
total (was 1) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 48s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 1s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 95m 19s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  Should 
org.apache.hadoop.hbase.regionserver.querymatcher.UserScanQueryMatcher$SpecifiedNumVersionsColumnFilter
 be a _static_ inner class?  At UserScanQueryMatcher.java:inner class?  At 
UserScanQueryMatcher.java:[lines 272-299] |
| Failed junit tests | hadoop.hbase.filter.TestFilter |
|   | hadoop.hbase.io.encoding.TestSeekBeforeWithReverseScan |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874037/HBASE-17125.master.013.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux f36cb54ba435

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059035#comment-16059035
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 34s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
52s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
49s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 11s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 56s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
1s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
59m 29s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 
15s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 33s 
{color} | {color:red} hbase-client generated 8 new + 1 unchanged - 0 fixed = 9 
total (was 1) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 42s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 35m 17s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 35s 
{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 144m 10s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.filter.TestFilter |
|   | hadoop.hbase.master.locking.TestLockProcedure |
|   | hadoop.hbase.filter.TestInvocationRecordFilter |
|   | hadoop.hbase.io.encoding.TestSeekBeforeWithReverseScan |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874024/HBASE-17125.master.012.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux bfde76c5c6f2 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality |

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-22 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059000#comment-16059000
 ] 

Guanghao Zhang commented on HBASE-17125:


Attach 013 patch which remove the two new SQM to make code easy to read. And 
fix the bug in UserScanQueryMatcher internal.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.012.patch, 
> HBASE-17125.master.013.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-22 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058872#comment-16058872
 ] 

Guanghao Zhang commented on HBASE-17125:


Take a summary about the key points of this fix.
1. Easy to use for user, no need extra work.
2. No more complicated work for ColumnTracker. Only fix this when use filter.
3. Accurate javadoc. And mark scan's setMaxVersions as Deprecated.

Attach a 012 patch and upload to review board, too.
The new patch add two new SQM: UserScanWithFilterQueryMatcher and 
RawScanWithFilterQueryMatcher. Fix this bug in these SQM internal. If no 
filter, it should same with previous implementation. Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.012.patch, 
> HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058800#comment-16058800
 ] 

Duo Zhang commented on HBASE-17125:
---

But for most users they just do not use filter so I do not think it is a good 
idea to add the word 'filter' to the method name.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Phil Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058792#comment-16058792
 ] 

Phil Yang commented on HBASE-17125:
---

I think setVersions may still confused us and users. Call it 
setMaxVersionsAfterFilters? And I think we should add some comments here to 
tell users how we deal with cf's VERSIONS, filters and Scan#setMaxVersions

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058699#comment-16058699
 ] 

Ted Yu commented on HBASE-17125:


Based on patch v11 and the snippet I posted earlier, I have an aggregate patch 
which passes all Filter tests and the visibility test.
The complexity of the aggregate patch is on par with patch v11 (sans the 
SpecificNumberVersionsFilter).

Running thru whole test suite now.


> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058630#comment-16058630
 ] 

Anoop Sam John commented on HBASE-17125:


That makes sense Duo.. Already we have a FilterWrapper been used to wrap user 
side filters.  Pls check how this idea will turn out to be.. Thanks a lot 
Guanghao for ur perseverance.  Appreciate it !

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058623#comment-16058623
 ] 

Anoop Sam John commented on HBASE-17125:


bq.I think we can append a SpecificNumberVersionsFilter at the last if filter 
and versions are both present when initialize a scan at RS side.
U mean HBase code itself add it than asking the user to do this right?   Am 
fine with any approach (which seems out to be the best by all of us) which can 
help to solve this bug by HBase code itself.  (Than asking user to do some 
extra work)..  If this is really not possible, then only , as a last step ask 
the user to do some extra things.  
Will look at the patches and approaches.  Thanks guys

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058621#comment-16058621
 ] 

Duo Zhang commented on HBASE-17125:
---

Yeah I mean do it by ourselves, not by users.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058620#comment-16058620
 ] 

Anoop Sam John commented on HBASE-17125:


bq.I think we can append a SpecificNumberVersionsFilter at the last if filter 
and versions are both present when initialize a scan at RS side.
U mean HBase code itself add it than asking the user to do this right?   Am 
fine with any approach (which seems out to be the best by all of us) which can 
help to solve this bug by HBase code itself.  (Than asking user to do some 
extra work)..  If this is really not possible, then only , as a last step ask 
the user to do some extra things.  
Will look at the patches and approaches.  Thanks guys

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058594#comment-16058594
 ] 

Duo Zhang commented on HBASE-17125:
---

I think we can append a SpecificNumberVersionsFilter at the last if filter and 
versions are both present when initialize a scan at RS side. This way we do not 
need to modify the logic of ColumnTracker, the code is already complicated 
enough so do not put new stuffs to it. I still think the problem introduced by 
filter should also be addressed by filter.

And the name 'setVersions' does not make sense, change it to 'readAllVersions'? 
And how do you deal with raw scan?

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058546#comment-16058546
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 48s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
6s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
50s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s 
{color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 59s 
{color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
32m 58s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s 
{color} | {color:red} hbase-client generated 5 new + 1 unchanged - 0 fixed = 6 
total (was 1) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 46s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 42s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m 12s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestKeepDeletes |
|   | hadoop.hbase.regionserver.querymatcher.TestUserScanQueryMatcher |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873971/HBASE-17125.master.checkReturnedVersions.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 0c417fc24366 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 3489a1b |
| Default Java

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058476#comment-16058476
 ] 

Ted Yu commented on HBASE-17125:


Can you put the latest patch on review board ?

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058430#comment-16058430
 ] 

Guanghao Zhang commented on HBASE-17125:


bq. If the filter decide to skip a version, then reduce the returned count in 
ColumnTracker.
This method is too trick. And it is easy to have bug. So I upload a new patch 
(checkReturnedVersions.patch) which use the second idea in the description.
It have three steps to match column.
1. check the column family's max versions.
2. check by filter
3. check the returned versions. (This can be set by user).

About the setFilter()'s javadoc. It says "called AFTER all tests
for ttl, column match, deletes and max versions have been run." Talked with 
[~yangzhe1991] and [~Apache9], we thought the max versions is easy to 
misunderstanding. Because the column family has a max versions config and user 
can set a max versions to scan. So in the new patch, I update the javadoc of 
setFilter() method. The new javadoc is "called AFTER all tests for ttl, column 
match, deletes and column family's max versions have been run". And add a new 
method setVersions() for scan, which means how many versions will be returned 
to user. And add a @deprecated mark for setMaxVersions() method. Thanks. 

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, 
> HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057556#comment-16057556
 ] 

Steve Loughran commented on HBASE-17125:


bq. Then why don't you implement it by yourself? If you think it is easy, then 
please implement it.

Generally, as a committer it's actually more productive to nurture other 
developers into working towards what you believe to be the right answer than do 
it yourself. As well as sharing some of your unrealistic set of deliverables 
with others, you can be the reviewer to gets the stuff in, instead of having a 
patch you have chase other people to review. Long term: the more people you can 
you can get to collaborate helps the project all round.

No opinions on the patch, just making sure everyone works together on this. 
Thanks.




> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057381#comment-16057381
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 29s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
40s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
47s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 34s 
{color} | {color:red} hbase-server in master has 12 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
31m 9s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 124m 14s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 183m 11s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873837/HBASE-17125.master.no-specified-filter.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 4238af0c8af6 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 83be50c |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7276/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7276/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7276/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7276/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7276/console |
| Powered by | Apache Yetus 0.3.0

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057373#comment-16057373
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 59s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
40s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
56s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 15s 
{color} | {color:red} hbase-server in master has 12 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
31m 31s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 119m 48s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 183m 5s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12873837/HBASE-17125.master.no-specified-filter.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux ffcf4b97ea45 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 83be50c |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7275/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7275/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/7275/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7275/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7275/console |
| Powered by | Apache Yetus 0.3.0

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057276#comment-16057276
 ] 

Ted Yu commented on HBASE-17125:


When I ran TestImportTSVWithVisibilityLabels with the no-specified-filter 
patch, it hung.

Please put HBASE-17125.master.no-specified-filter.patch on review board.

Thanks

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057233#comment-16057233
 ] 

Guanghao Zhang commented on HBASE-17125:


bq. My request would be lets try that approach
I attach a HBASE-17125.master.no-specified-filter.patch for it. The new 
approach still check versions first, then check by filter. And ColumnTracker 
use a returnedVersions to track how many versions will be returned. So 
checkVerisons will check max version by column family's versions. And the 
returned versions will controlled by user's scan's maxVersions. If the filter 
decide to skip a version, then reduce the returned count in ColumnTracker.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057199#comment-16057199
 ] 

Guanghao Zhang commented on HBASE-17125:


bq. Can you try it out once Guanghao Zhang?
Ok. I will try to trigger the Hadoop QA again.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057017#comment-16057017
 ] 

ramkrishna.s.vasudevan commented on HBASE-17125:


I was following this discussion previously and lost track. I have seen this 
problem consistently arising. In all the different JIRAs related to this the 
fix inside the core was always having some side effects one way or the other.

For the Visibility labels and ACL comment from Anoop, I think if we remove the 
Logic in VisiblityLabelFilter's filterKV to get the max number of versions and 
just add the new filter at the end in VisibiityController and set 
scan.setMaxVersion() all the visibilty tests should still pass. Can you try it 
out once [~zghaobac]?

I think having a javadoc and a new filter for the user in case where he really 
wants multiple versions to be returned back even while filtering means he can 
try using the new filter provided (if it is covering all the cases) as the core 
side changes are minimal and most importantly bug free.
IMHO - it is fine with me to add a new filter and expose it to user for those 
who need this filter + version specific usage.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056975#comment-16056975
 ] 

Anoop Sam John commented on HBASE-17125:


Agree Duo.. We say in javadoc when we will call filter.  But if we can solve 
this issue in our code itself, that is great.  Ur concern of more complexity 
also valid only..  My request would be lets try that approach (I dont mind who 
do :-)  )  also and see.. How complex or not it is.  Any perf impact or not.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056973#comment-16056973
 ] 

Anoop Sam John commented on HBASE-17125:


What Ted 2 reasons why I fear abt the way of asking new Filter usage..   I did 
not check his suggestion fully wrt code and side effects if any..  But IMHO we 
should try it out..  With out much perf penalty if we can do, why not!  If the 
code itself can handle this that is always better and users will be much more 
happy(Than they have to remember and use another filter)..  Lets be open for 
all possible ways. My request he would be pls see possible ways with which we 
can solve it on our own..  All agree here that the present situation is a bug 
and really bad behave from system. We never give back deterministic results. 
(All depends on other factors like compaction done or not etc)

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056932#comment-16056932
 ] 

Duo Zhang commented on HBASE-17125:
---

And I need to say again, we have never break the javadoc of setMaxVersions. If 
you do not use filtr, then the behavior is correct. If you use filter, then 
this is what the javadoc of setFilter says, filter is tested at last.

{quote}
  /**
   * Apply the specified server-side filter when performing the Query.
   * Only {@link Filter#filterKeyValue(org.apache.hadoop.hbase.Cell)} is called 
AFTER all tests
   * for ttl, column match, deletes and max versions have been run.
   * @param filter filter to run on the server
   * @return this for invocation chaining
   */
{quote}

Filter always introduces behaviors which are not intuitive because it is too 
flexible. For example, PageFilter may still return more rows than configured. 
You need to know the details of HBase if you want to use filter correctly. 
That's why I want to fix the problem in a simpler way rather than a complicated 
way. It does not make any big difference to users. For normal users they just 
do not care, and for advanced users they must know the implementation details.

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056895#comment-16056895
 ] 

Duo Zhang commented on HBASE-17125:
---

We have already discussed many times on how to implement it, either in our 
company or here. 

{quote}
 which doesn't add much more complexity
{quote}
Then why don't you implement it by yourself? If you think it is easy, then 
please implement it.

And your approach will lose the ability to control the versions passed to 
filter. And it can not be addressed by adding a new filter at the beginning, 
because you will always reset the version count if the filter list returns SKIP.

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056871#comment-16056871
 ] 

Ted Yu commented on HBASE-17125:


The reasons Anoop and I don't favor SpecifiedNumVersionsColumnFilter are:

* it is not intuitive.
* user may use it incorrectly (conside nested FilterLists).

Guanghao has agreed to try the new approach which doesn't add much more 
complexity on top of what is posted so far. Let's give him some time.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056860#comment-16056860
 ] 

Duo Zhang commented on HBASE-17125:
---

And we have already speak out at HBaseCon, there is no big concerns. I think 
most users just do not care about it as usually we only have one version, just 
like the user asked on mailing list.

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056839#comment-16056839
 ] 

Duo Zhang commented on HBASE-17125:
---

{quote}
Am still not in favor of asking the user to configure some extra Filter to get 
an expected behave from the system
{quote}

I'd say again that the javadoc never guarantee the current behavior and no 
doubt it is a broken semantic. And see my comment above,  I think use another 
filter to address the problem introduced by filter is the right direction. We 
should not put too many complexities to our core system.

And see my comment above, a real user case which shows that the current 
approach can solve his/her problem

{quote}
Oh, seems the user calls setMaxVerions to 1. I believe the problem is that 
he/she found that the filter will return old values then he/she use 
setMaxVersions(1) and hope this could solve the problem.
So it is clear that in this user's mind, setMaxVersions should be used to 
control the number of versions passed to the filter. This is exactly what we 
provide in the latest patch. With the patch in place, the user does not need to 
call setMaxVersions(1) anymore.
Thanks.
{quote}

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056801#comment-16056801
 ] 

Ted Yu commented on HBASE-17125:


When I tested my tentative patch, TestImportTSVWithVisibilityLabels hung.

TestImportTSVWithVisibilityLabels#testBulkOutputWithTsvImporterTextMapperWithInvalidLabels
 seems to be the last subtest running before I killed the surefire process.

FYI

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056759#comment-16056759
 ] 

Guanghao Zhang commented on HBASE-17125:


bq. Here is snippet showing the concept of slack:
bq. IMHO we should address this issue on our own (Like what Ted is trying to 
suggest here).
Ok. Let me try it.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056042#comment-16056042
 ] 

Anoop Sam John commented on HBASE-17125:


IMHO we should address this issue on our own (Like what Ted is trying to 
suggest here).  Am still not in favor of asking the user to configure some 
extra Filter to get an expected behave from the system.  Can we pls think in 
that direction too.  Ya ways were mentioned above also.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-20 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055765#comment-16055765
 ] 

Ted Yu commented on HBASE-17125:


Here is snippet showing the concept of slack:
https://pastebin.com/diBC8B9M

In UserScanQueryMatcher, around line 163, the call to retract() should be added:
{code}
  switch (filterResponse) {
case SKIP:
  if (colChecker == MatchCode.INCLUDE) {
columns.retract();
return MatchCode.SKIP;
{code}
The above is tentative - certain unit tests don't pass yet.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-19 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055179#comment-16055179
 ] 

Guanghao Zhang commented on HBASE-17125:


The findbugs and whitespace were introduced by generated protobuf. And the 
failed tests were not rleated. They passed locally. Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-19 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055141#comment-16055141
 ] 

Guanghao Zhang commented on HBASE-17125:


bq. which test in the patch exercises the above scenario ?
The SpecifiedNumVersionsColumnFilter control how many versions column will be 
returned. Add a unit test 
TestFromClientSide#testSpecifiedNumVersionsColumnFilter for it. Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-19 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054123#comment-16054123
 ] 

Ted Yu commented on HBASE-17125:


For my previous comment:
https://issues.apache.org/jira/browse/HBASE-17125?focusedCommentId=15976972=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15976972

which test in the patch exercises the above scenario ?

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053863#comment-16053863
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
48s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 16m 
58s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
45s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 26s 
{color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 12m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
32m 42s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 7s 
{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} hbase-client generated 0 new + 1 unchanged - 1 fixed = 
1 total (was 2) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s 
{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 41s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 125m 7s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
54s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 220m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.security.token.TestZKSecretWatcher |
|   | hadoop.hbase.master.procedure.TestMasterProcedureWalLease |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-19 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053587#comment-16053587
 ] 

Guanghao Zhang commented on HBASE-17125:


Add a 011 patch which rebase the latest master branch. [~anoop.hbase] [~stack] 
Any more concerns? If no objections, I will commit it tomorrow. Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-05-11 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006261#comment-16006261
 ] 

Guanghao Zhang commented on HBASE-17125:


Ping [~anoop.hbase] [~lhofhansl] [~stack]...

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-05-04 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996287#comment-15996287
 ] 

Guanghao Zhang commented on HBASE-17125:


[~anoop.hbase] Any more concerns?

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996244#comment-15996244
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 
53s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 13s 
{color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
34m 17s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
52s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s 
{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 29s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 128m 59s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
59s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 217m 40s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12866316/HBASE-17125.master.010.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  cc  hbaseprotoc  |
| uname | Linux 860de2b1a771 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality |

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-24 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980756#comment-15980756
 ] 

Duo Zhang commented on HBASE-17125:
---

'If not user filter,' -> 'If not use filter'.

Any other concerns? [~anoop.hbase].

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978817#comment-15978817
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
37s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 33s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 15s 
{color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
33m 29s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s 
{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 28s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 121m 53s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
51s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 208m 56s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.snapshot.TestMobExportSnapshot |
|   | hadoop.hbase.snapshot.TestExportSnapshot |
|   | hadoop.hbase.snapshot.TestMobSecureExportSnapshot |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12864479/HBASE-17125.master.009.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  cc  hbaseprotoc  |
| uname | Linux

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-21 Thread Chia-Ping Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978806#comment-15978806
 ] 

Chia-Ping Tsai commented on HBASE-17125:


TestWalAndCompactingMemStoreFlush is traced by HBASE-17943.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978609#comment-15978609
 ] 

Hadoop QA commented on HBASE-17125:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
26s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 24s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 
2s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 5s 
{color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 
2s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
28m 38s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} |
| {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 
2s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 112m 22s 
{color} | {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
48s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 190m 27s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.regionserver.TestWalAndCompactingMemStoreFlush |
|   | hadoop.hbase.snapshot.TestExportSnapshot |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12864458/HBASE-17125.master.009.patch
 |
| JIRA Issue | HBASE-17125 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  cc  hbaseprotoc  |
| uname | Linux 2f81ad5ac9e7 3.13.0-106-generic #153-Ubuntu SMP Tue

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-21 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978505#comment-15978505
 ] 

Guanghao Zhang commented on HBASE-17125:


Update the javadoc in the latest 009 patch.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-21 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978373#comment-15978373
 ] 

Duo Zhang commented on HBASE-17125:
---

If not use filter, get up to the specified number of versions of each column.

If use filter, it means the maximum versions of each column will be checked by 
filter. So the scan may return less than the value you set here as the filter 
may filter out some cells. If you want to get a specific number of version for 
each column after filtering, please call setMaxVersions() and use 
SpecifiedNumVersionsColumnFilter. Notice that the 
SpecifiedNumVersionsColumnFilter should be placed at the last position in 
FilterList to make sure it will be checked at last.

Better? And seems get also has this method so you need to change the wording 
from scan to get if possible?

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-21 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978169#comment-15978169
 ] 

Guanghao Zhang commented on HBASE-17125:


The right javadoc of setMaxVersions is:
/**
* If not use filter, get up to the specified number of versions of each column. 

* If use filter, it means the maximum versions of each column will be checked 
by filter. So the scan
* maybe return less than maximum versions for each column. But you can add a 
SpecifiedNumVersionsColumnFilter
* to get the specified number of versions of each column.
* @param maxVersions maximum versions for each column
* @return this  
*/

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978099#comment-15978099
 ] 

Duo Zhang commented on HBASE-17125:
---

Oh, seems the user calls setMaxVerions to 1. I believe the problem is that 
he/she found that the filter will return old values then he/she use 
setMaxVersions(1) and hope this could solve the problem.

So it is clear that in this user's mind, setMaxVersions should be used to 
control the number of versions passed to the filter. This is exactly what we 
provide in the latest patch. With the patch in place, the user does not need to 
call setMaxVersions(1) anymore.

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978098#comment-15978098
 ] 

Guanghao Zhang commented on HBASE-17125:


For the user mailing list case, the column's version is 1. So the user didn't 
need to setMaxVersions(). If the recent value is not matching, he will gets 
nothing.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978092#comment-15978092
 ] 

Duo Zhang commented on HBASE-17125:
---

{quote}
So there also the user has to set this filter on get/scan and setMaxversions()?
{quote}
No. After the patch here the problem is gone. Just keep the code, everything 
will be OK.

And the problem of this fix is that we can not use setMaxVerions to control the 
number of returned versions if you user filter(maybe). So we introduce a 
SpecifiedNumVersionsColumnFilter to solve the problem.

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978087#comment-15978087
 ] 

Anoop Sam John commented on HBASE-17125:


In user@ mailing list there was a query from a user regarding similar issue 
while using value filter.   The recent value of a cell is not matching the 
value as per filter still he gets a older version of the cell.. He needs only 
one version (latest). So there also the user has to set this filter on get/scan 
and setMaxversions()?

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978085#comment-15978085
 ] 

Duo Zhang commented on HBASE-17125:
---

{quote}
So setMaxVersions with any value >= 5 (not only 5), then the server can check 
all versions.
{quote}

Yes, just call scan.setMaxVerions(), do not need to give a specific value.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978080#comment-15978080
 ] 

Guanghao Zhang commented on HBASE-17125:


bq. It will so complicated for a user to set this as 5 and then the filter with 
3. 
The default version is 1. So the scan will only check the latest version. The 
user need set a bigger value if he want read more than one version. This 
scenario: the column's versions is 5. So setMaxVersions with any value >= 5 
(not only 5), then the server can check all versions.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978078#comment-15978078
 ] 

Duo Zhang commented on HBASE-17125:
---

Why users need to setMaxVerions to 5 if they only want 3 versions? The fact is, 
if you do not use filter, then just use setMaxVersions to control the number of 
the veraions returned. If you use filter, and then please use 
SpecifiedNumVersionsColumnFilter if you want to control the number of versions 
returned as the max versions will be tested before filter. The setMaxVersions 
is used to control the number of versions passed to filter. I think this is 
clear enough?

Thanks.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978067#comment-15978067
 ] 

Anoop Sam John commented on HBASE-17125:


I dont think it is correct to ask users to do setMaxVersions(5).  It will so 
complicated for a user to set this as 5 and then the filter with 3.  This is 
like we pass our impl headache to user.  IMHO.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978000#comment-15978000
 ] 

Guanghao Zhang commented on HBASE-17125:


bq. If user calls scan#setMaxVersions(5), server would check more versions 
(than 3). However, there is a chance that more than 3 versions would be 
returned.
This can be addressed by scan.setFilter(new 
SpecifiedNumVersionsColumnFilter(3)). setMaxVersions means how many version 
will be check. And SpecifiedNumVersionsColumnFilter means how many versions 
will be returned.

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977993#comment-15977993
 ] 

Ted Yu commented on HBASE-17125:


Let's look at the scenario again:

bq. if a column's max version is 5 and the user query only need 3 versions

If user calls scan#setMaxVersions(5), server would check more versions (than 
3). However, there is a chance that more than 3 versions would be returned.
Instead of letting user deal with the slack, it would be better to handle this 
server side.

My proposal only involves a few lines of change to your latest patch - though 
there may be some unit test failure(s).

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977981#comment-15977981
 ] 

Guanghao Zhang commented on HBASE-17125:


bq. Can we pass this information (let's call it the slack) to ColumnTracker 
ctor ?
The javadoc of scan.setMaxVersions has been changed. If user set max versions 
is less than the column's max versions, it means user didn't want to check all 
versions. So I thought we don't need this information?

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Guanghao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977972#comment-15977972
 ] 

Guanghao Zhang commented on HBASE-17125:


bq. How is the above addressed in the current patch ?
Now scan.setMaxVerrsions means how many versions will be check. So this can be 
addressed by scan.setMaxVersions(5).

> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-04-20 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976972#comment-15976972
 ] 

Ted Yu commented on HBASE-17125:


bq. has another problem, if a column's max version is 5 and the user query only 
need 3 versions. It first check the version's number, then check the cell by 
filter. So the cells number of the result may less than 3. But there are 2 
versions which don't read anymore.

How is the above addressed in the current patch ?

Currently the max versions is obtained this way (see UserScanQueryMatcher):
{code}
int maxVersions = scan.isRaw() ? scan.getMaxVersions()
: Math.min(scan.getMaxVersions(), scanInfo.getMaxVersions());
{code}
The column tracker loses some information when column's max versions is greater 
than that specified in the Scan.
Can we pass this information to ColumnTracker so that the column tracker can 
return richer information (thru a tuple, e.g.) from checkVersions() ?
{code}
colChecker = columns.checkVersions(cell, timestamp, typeByte, false);
{code}
That way, when filterResponse is SKIP, we can utilize the extra information to 
address the scenario described above.


> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

1 2 >

1 - 100 of 141 matches

Mail list logo