[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131125#comment-16131125 ] Hudson commented on HBASE-17125: FAILURE: Integrated in Jenkins build HBASE-14070.HLC #233 (See [https://builds.apache.org/job/HBASE-14070.HLC/233/]) HBASE-17125 Inconsistent result when use filter to read data (zghao: rev 4dd24c52b84c74a477e00ab6177d081c29462dd8) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Get.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/UserScanQueryMatcher.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Query.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanWildcardColumnTracker.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0, 3.0.0 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, > HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123164#comment-16123164 ] Chia-Ping Tsai commented on HBASE-17125: +1 > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0, 3.0.0 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, > HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123083#comment-16123083 ] Hudson commented on HBASE-17125: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #3511 (See [https://builds.apache.org/job/HBase-Trunk_matrix/3511/]) HBASE-17125 Inconsistent result when use filter to read data (zghao: rev 4dd24c52b84c74a477e00ab6177d081c29462dd8) * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Get.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Query.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanWildcardColumnTracker.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/UserScanQueryMatcher.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0, 3.0.0 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, > HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122891#comment-16122891 ] Hudson commented on HBASE-17125: FAILURE: Integrated in Jenkins build HBase-2.0 #308 (See [https://builds.apache.org/job/HBase-2.0/308/]) HBASE-17125 Inconsistent result when use filter to read data (zghao: rev 8197a31bbc4c49b4edfc2a0f01b3ef29b40e268d) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Query.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Get.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/UserScanQueryMatcher.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanWildcardColumnTracker.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0, 3.0.0 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, > HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122841#comment-16122841 ] Guanghao Zhang commented on HBASE-17125: Pushed to master and branch-2. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0, 3.0.0 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, > HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122653#comment-16122653 ] Guanghao Zhang commented on HBASE-17125: Thanks all for reviewing. Will commit it later if no other objections. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, > HBASE-17125.master.021.patch, HBASE-17125.master.022.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121908#comment-16121908 ] Hadoop QA commented on HBASE-17125: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 37s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 32m 48s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 55s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}156m 21s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}216m 13s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:bdc94b1 | | JIRA Issue | HBASE-17125 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12881209/HBASE-17125.master.022.patch | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux a96bed34faa6 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 6246523 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/8015/testReport/ | | modules | C: hbase-client hbase-server U: . | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/8015/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > Inconsistent result when
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117703#comment-16117703 ] Guanghao Zhang commented on HBASE-17125: Hadoop QA passed. [~anoop.hbase] [~Apache9] [~tedyu] [~chia7712] Any more concerns? > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.020.patch, > HBASE-17125.master.021.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116334#comment-16116334 ] Hadoop QA commented on HBASE-17125: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 44s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 32m 32s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 47s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}112m 21s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}168m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:bdc94b1 | | JIRA Issue | HBASE-17125 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880597/HBASE-17125.master.021.patch | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 759ed675f525 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / fd76eb3 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/7958/testReport/ | | modules | C: hbase-client hbase-server U: . | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7958/console | | Powered by | Apache Yetus 0.4.0 http://yetus.apache.org | This message was automatically generated. > Inconsistent result when use filter
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116073#comment-16116073 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 24s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 30m 8s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 5s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s{color} | {color:red} hbase-client generated 5 new + 0 unchanged - 0 fixed = 5 total (was 0) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 40s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 17s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}162m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.master.procedure.TestDisableTableProcedure | | | org.apache.hadoop.hbase.master.procedure.TestModifyTableProcedure | | | org.apache.hadoop.hbase.master.procedure.TestCreateTableProcedure | | | org.apache.hadoop.hbase.master.procedure.TestEnableTableProcedure | | | org.apache.hadoop.hbase.master.procedure.TestServerCrashProcedure | | | org.apache.hadoop.hbase.master.procedure.TestDeleteTableProcedure | | | org.apache.hadoop.hbase.master.TestGetLastFlushedSequenceId | | | org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer2 | | | org.apache.hadoop.hbase.master.TestAssignmentManagerMetrics | | | org.apache.hadoop.hbase.master.TestGetInfoPort | | | org.apache.hadoop.hbase.master.TestMasterFailoverBalancerPersistence | | | org.apache.hadoop.hbase.master.cleaner.TestSnapshotFromMaster | | | org.apache.hadoop.hbase.master.normalizer.TestSimpleRegionNormalizerOnCluster | | | org.apache.hadoop.hbase.master.TestTableStateManager
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113788#comment-16113788 ] Guanghao Zhang commented on HBASE-17125: Ping [~anoop.hbase] [~Apache9] for reviewing. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113785#comment-16113785 ] Guanghao Zhang commented on HBASE-17125: bq. testXXXWithFilterHint and testXXXWithFilter should be removed because they can't reproduce the bug of top (cell) change If no objections about 020 patch, I will remove them when prepare a final patch. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113784#comment-16113784 ] Guanghao Zhang commented on HBASE-17125: [~tedyu] Ok. I thought the slack idea is not easy to understand... Any concerns about the 020 patch? > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113780#comment-16113780 ] Guanghao Zhang commented on HBASE-17125: The failed ut is not related. And it was tracked by HBASE-18425. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113458#comment-16113458 ] Ted Yu commented on HBASE-17125: Please see my comment on Jun 26th. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113410#comment-16113410 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 38s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 34m 23s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 37s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s{color} | {color:red} hbase-client generated 5 new + 0 unchanged - 0 fixed = 5 total (was 0) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 49s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}145m 27s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 1s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}206m 14s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.TestMasterFailover | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:bdc94b1 | | JIRA Issue | HBASE-17125 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880203/HBASE-17125.master.020.patch | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 088eafb945e5 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / fe890b7 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC3 | | javadoc | https://builds.apache.org/job/PreCommit-HBASE-Build/7913/artifact/patchprocess/diff-javadoc-javadoc-hbase-client.txt | | unit |
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112885#comment-16112885 ] Chia-Ping Tsai commented on HBASE-17125: As mentioned in HBASE-18295, {quote} The bugs caused by filter may be resolved by HBASE-17125 because the patch make matcher check the version before asking filter. If the SEEK_NEXT_COLUMN is returned, the filter.filterKeyValue isn't evaluated. Maybe we should push the HBASE-17125...FYI Guanghao Zhang {quote} >From my perspective, *testXXXWithFilterHint* and *testXXXWithFilter* should be >removed because they can't reproduce the bug of top (cell) change. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112478#comment-16112478 ] Guanghao Zhang commented on HBASE-17125: Checked the failed ut. It was related with HBASE-18295. After this patch, we will check versions first then check filter. So after checkVersions return SEEK_NEXT_COLUMN, the filter.filterKeyValue isn't evaluated. When the currentSize is 2, then the "return ReturnCode.NEXT_ROW;" is not be called. And it will be called when we want to scan next row, so the ut failed.. [~chia7712] I changed the ut to make sure it will be called in one row. Please help to review the 020 patch. Thanks. {code} public Filter.ReturnCode filterKeyValue(Cell v) throws IOException { if (timeToGoNextRow.get()) { timeToGoNextRow.set(false); return ReturnCode.NEXT_ROW; } else { return ReturnCode.INCLUDE; } } {code} > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.020.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108697#comment-16108697 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 32s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 36m 25s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 59s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s{color} | {color:red} hbase-client generated 5 new + 0 unchanged - 0 fixed = 5 total (was 0) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 53s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 42s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}189m 18s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestStore | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:bdc94b1 | | JIRA Issue | HBASE-17125 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12879785/HBASE-17125.master.019.patch | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 69efc5ab0096 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / a5db120 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC3 | | javadoc | https://builds.apache.org/job/PreCommit-HBASE-Build/7869/artifact/patchprocess/diff-javadoc-javadoc-hbase-client.txt | | unit |
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108495#comment-16108495 ] Guanghao Zhang commented on HBASE-17125: Attach a 019 patch only change in SQM. Ping [~anoop.hbase] [~Apache9] for reviewing. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.019.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106879#comment-16106879 ] Anoop Sam John commented on HBASE-17125: What is the next step here? Any work based on Duo's latest suggestion? Just asking as I remembered abt this issue. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063058#comment-16063058 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 1s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 47s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 8s {color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 34m 24s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 45s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s {color} | {color:red} hbase-client generated 5 new + 1 unchanged - 1 fixed = 6 total (was 2) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 44s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 118m 33s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 57s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 181m 7s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.io.asyncfs.TestSaslFanOutOneBlockAsyncDFSOutput | | Timed out junit tests | org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer2 | | | org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer | | | org.apache.hadoop.hbase.master.TestDistributedLogSplitting | | | org.apache.hadoop.hbase.snapshot.TestExportSnapshotNoCluster | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874472/HBASE-17125.master.018.patch | | JIRA Issue | HBASE-17125 | | Optional Tests |
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063030#comment-16063030 ] Ted Yu commented on HBASE-17125: bq. The ut is TestFromClientSide#testReadWithFilter and TestHRegion#testGetWithFilter I plugged in these two tests into 17125-slack-13.txt. They passed. Please double check. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062900#comment-16062900 ] Duo Zhang commented on HBASE-17125: --- Can we avoid changing the ColumnTracker interface? If you all really hate modifying the filter of SQM, then my bottom line is to implement the logic of FilterAwareColumnTracker inside SQM directly. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062871#comment-16062871 ] Guanghao Zhang commented on HBASE-17125: Attach a 018 patch. bq. The following would allow the subtest to run alone: Fixed it. bq. How about calling this wrapper FilterAwareColumnTracker (or something like that) ? Refactored. bq. adding more tests on diff scenarios. The ut is TestFromClientSide#testReadWithFilter and TestHRegion#testGetWithFilter. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.018.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062831#comment-16062831 ] Anoop Sam John commented on HBASE-17125: "FilterAwareColumnTracker " - Sounds good. Ya +1 on adding more tests on diff scenarios. Thanks > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062382#comment-16062382 ] Ted Yu commented on HBASE-17125: Since existing unit tests didn't uncover the defect in 17125-slack-13.txt, please add new test to show that patch v17 is correct in this regard. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062150#comment-16062150 ] Ted Yu commented on HBASE-17125: {code} +final WAL wal = HBaseTestingUtility.createWal(TEST_UTIL.getConfiguration(), logDir, info); +this.region = TEST_UTIL.createLocalHRegion(info, htd, wal); {code} The above code relies on other test to initialize chunk creator. If you run the subtest alone, you would observe NPE like the following: {code} MemStoreLABImpl.getOrMakeChunk() line: 242 MemStoreLABImpl.copyCellInto(Cell) line: 118 MutableSegment(Segment).maybeCloneWithAllocator(Cell) line: 168 CompactingMemStore(AbstractMemStore).maybeCloneWithAllocator(Cell) line: 268 CompactingMemStore(AbstractMemStore).add(Cell, MemstoreSize) line: 107 CompactingMemStore(AbstractMemStore).add(Iterable, MemstoreSize) line: 101 HStore.add(Iterable, MemstoreSize) line: 711 HRegion.applyToMemstore(Store, List, boolean, MemstoreSize) line: 4001 HRegion.applyFamilyMapToMemstore(Map, MemstoreSize) line: 3984 HRegion.doMiniBatchMutate(BatchOperation) line: 3439 HRegion.batchMutate(BatchOperation) line: 3131 HRegion.batchMutate(Mutation[], long, long) line: 3073 HRegion.batchMutate(Mutation[]) line: 3077 HRegion.doBatchMutate(Mutation) line: 3827 HRegion.put(Put) line: 2950 TestHRegion.testGetWithFilter() line: 2665 {code} The following would allow the subtest to run alone: {code} +ChunkCreator.initialize(MemStoreLABImpl.CHUNK_SIZE_DEFAULT, false, 0, 0, 0, null); +this.region = TEST_UTIL.createLocalHRegion(info, htd, wal); {code} > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062138#comment-16062138 ] Ted Yu commented on HBASE-17125: {code} +public class ColumnTrackerWrapper implements ColumnTracker { {code} There may be more wrapper for ColumnTracker in the future. How about calling this wrapper ColumnTrackerWithVersions (or something like that) ? > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.017.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16062069#comment-16062069 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 36s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 44s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 50s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 5m 54s {color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 58m 43s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 2s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 33s {color} | {color:red} hbase-client generated 5 new + 1 unchanged - 0 fixed = 6 total (was 1) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 41s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 33m 54s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 134m 51s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.locking.TestLockProcedure | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874373/HBASE-17125.master.017.patch | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 4c11bb685b75 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 96aca6b | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs |
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061779#comment-16061779 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 36s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 50s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 13s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 1s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 56s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 52s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 6m 31s {color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 5s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 58m 15s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 47s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 33s {color} | {color:red} hbase-client generated 5 new + 1 unchanged - 1 fixed = 6 total (was 2) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 40s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 36m 12s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 142m 30s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.filter.TestFilter | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874342/HBASE-17125.master.016.patch | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 84ee331aa840 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061726#comment-16061726 ] Ted Yu commented on HBASE-17125: Based on patch v16: {code} testColumnPaginationFilter(org.apache.hadoop.hbase.filter.TestFilter) Time elapsed: 0.165 sec <<< FAILURE! java.lang.AssertionError at org.apache.hadoop.hbase.filter.TestFilter.verifyScan(TestFilter.java:1667) at org.apache.hadoop.hbase.filter.TestFilter.testColumnPaginationFilter(TestFilter.java:1945) ... testGet_Basic(org.apache.hadoop.hbase.regionserver.TestHRegion) Time elapsed: 0.099 sec <<< FAILURE! java.lang.AssertionError at org.apache.hadoop.hbase.regionserver.TestHRegion.testGet_Basic(TestHRegion.java:2616) {code} Can you double check ? Thanks > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.015.patch, > HBASE-17125.master.016.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060614#comment-16060614 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 32m 38s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 38s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 21s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 20s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 49s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 31s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 36s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 47s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 31s {color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 47s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 69m 6s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 57s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 52s {color} | {color:red} hbase-client generated 8 new + 1 unchanged - 1 fixed = 9 total (was 2) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 48s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 46m 35s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 207m 48s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.filter.TestFilter | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874200/HBASE-17125.master.014.patch | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux b7c5a776da35 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060456#comment-16060456 ] Anoop Sam John commented on HBASE-17125: So this patch is in line with some old proposal that u put right.. I like it.. Doing some more careful reviews. Pls put the latest patch in RB > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060430#comment-16060430 ] Anoop Sam John commented on HBASE-17125: Reviewing HBASE-17125.master.014.patch now > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.014.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060309#comment-16060309 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 3s {color} | {color:blue} The patch file was not named according to hbase's naming conventions. Please see https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for instructions. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 35s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 36s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 57s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 54s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 58s {color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 29m 2s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 44s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 17s {color} | {color:red} hbase-client generated 2 new + 1 unchanged - 1 fixed = 3 total (was 2) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 24s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 117m 5s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 171m 11s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.procedure.TestMasterProcedureWalLease | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874152/17125-slack-13.txt | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060247#comment-16060247 ] Ted Yu commented on HBASE-17125: Review board: https://reviews.apache.org/r/60381/ This is just an alternate approach. Guanghao is the owner of this JIRA. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: 17125-slack-13.txt, example.diff, > HBASE-17125.master.001.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.003.patch, > HBASE-17125.master.004.patch, HBASE-17125.master.005.patch, > HBASE-17125.master.006.patch, HBASE-17125.master.007.patch, > HBASE-17125.master.008.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.010.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.012.patch, HBASE-17125.master.013.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060137#comment-16060137 ] Guanghao Zhang commented on HBASE-17125: [~tedyu] your "slack" idea is easy to have bug and has side effects. I try it in HBASE-17125.master.no-specified-filter.patch. bq. I have run test suite with updated aggregate patch which passed As i said in email, "ut passed" doesn't mean that your patch is right... You send me a v11 patch("ut passed"), I pointed the bug in it. Then you fix it. But the latest patch which you send me is still wrong.. Please upload your patch here and review board. And let more people to review to make it right. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.012.patch, > HBASE-17125.master.013.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059450#comment-16059450 ] Ted Yu commented on HBASE-17125: My aggregate patch v11 has defect in that it doesn't handle ExplicitColumnTracker#reset() correctly. ColumnCount should have slack and retract() as well. ExplicitColumnTracker#retract() delegates to ColumnCount#retract(). When ExplicitColumnTracker#reset() is called, it resets the slack in each ColumnCount to the initial value. I have run test suite with updated aggregate patch which passed (other than the tests flagged by flaky test dashboard). > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.012.patch, > HBASE-17125.master.013.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059123#comment-16059123 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 47s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 32s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 16s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 1s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 44s {color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 31m 16s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 26s {color} | {color:red} hbase-server generated 1 new + 10 unchanged - 0 fixed = 11 total (was 10) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 20s {color} | {color:red} hbase-client generated 8 new + 1 unchanged - 0 fixed = 9 total (was 1) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 48s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 1s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 95m 19s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | Should org.apache.hadoop.hbase.regionserver.querymatcher.UserScanQueryMatcher$SpecifiedNumVersionsColumnFilter be a _static_ inner class? At UserScanQueryMatcher.java:inner class? At UserScanQueryMatcher.java:[lines 272-299] | | Failed junit tests | hadoop.hbase.filter.TestFilter | | | hadoop.hbase.io.encoding.TestSeekBeforeWithReverseScan | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874037/HBASE-17125.master.013.patch | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux f36cb54ba435
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059035#comment-16059035 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 13s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 34s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 52s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 49s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 11s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 56s {color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 59m 29s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 33s {color} | {color:red} hbase-client generated 8 new + 1 unchanged - 0 fixed = 9 total (was 1) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 42s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 35m 17s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 35s {color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 144m 10s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.filter.TestFilter | | | hadoop.hbase.master.locking.TestLockProcedure | | | hadoop.hbase.filter.TestInvocationRecordFilter | | | hadoop.hbase.io.encoding.TestSeekBeforeWithReverseScan | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874024/HBASE-17125.master.012.patch | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux bfde76c5c6f2 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059000#comment-16059000 ] Guanghao Zhang commented on HBASE-17125: Attach 013 patch which remove the two new SQM to make code easy to read. And fix the bug in UserScanQueryMatcher internal. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.012.patch, > HBASE-17125.master.013.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058872#comment-16058872 ] Guanghao Zhang commented on HBASE-17125: Take a summary about the key points of this fix. 1. Easy to use for user, no need extra work. 2. No more complicated work for ColumnTracker. Only fix this when use filter. 3. Accurate javadoc. And mark scan's setMaxVersions as Deprecated. Attach a 012 patch and upload to review board, too. The new patch add two new SQM: UserScanWithFilterQueryMatcher and RawScanWithFilterQueryMatcher. Fix this bug in these SQM internal. If no filter, it should same with previous implementation. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.012.patch, > HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058800#comment-16058800 ] Duo Zhang commented on HBASE-17125: --- But for most users they just do not use filter so I do not think it is a good idea to add the word 'filter' to the method name. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058792#comment-16058792 ] Phil Yang commented on HBASE-17125: --- I think setVersions may still confused us and users. Call it setMaxVersionsAfterFilters? And I think we should add some comments here to tell users how we deal with cf's VERSIONS, filters and Scan#setMaxVersions > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058699#comment-16058699 ] Ted Yu commented on HBASE-17125: Based on patch v11 and the snippet I posted earlier, I have an aggregate patch which passes all Filter tests and the visibility test. The complexity of the aggregate patch is on par with patch v11 (sans the SpecificNumberVersionsFilter). Running thru whole test suite now. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058630#comment-16058630 ] Anoop Sam John commented on HBASE-17125: That makes sense Duo.. Already we have a FilterWrapper been used to wrap user side filters. Pls check how this idea will turn out to be.. Thanks a lot Guanghao for ur perseverance. Appreciate it ! > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058623#comment-16058623 ] Anoop Sam John commented on HBASE-17125: bq.I think we can append a SpecificNumberVersionsFilter at the last if filter and versions are both present when initialize a scan at RS side. U mean HBase code itself add it than asking the user to do this right? Am fine with any approach (which seems out to be the best by all of us) which can help to solve this bug by HBase code itself. (Than asking user to do some extra work).. If this is really not possible, then only , as a last step ask the user to do some extra things. Will look at the patches and approaches. Thanks guys > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058621#comment-16058621 ] Duo Zhang commented on HBASE-17125: --- Yeah I mean do it by ourselves, not by users. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058620#comment-16058620 ] Anoop Sam John commented on HBASE-17125: bq.I think we can append a SpecificNumberVersionsFilter at the last if filter and versions are both present when initialize a scan at RS side. U mean HBase code itself add it than asking the user to do this right? Am fine with any approach (which seems out to be the best by all of us) which can help to solve this bug by HBase code itself. (Than asking user to do some extra work).. If this is really not possible, then only , as a last step ask the user to do some extra things. Will look at the patches and approaches. Thanks guys > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058594#comment-16058594 ] Duo Zhang commented on HBASE-17125: --- I think we can append a SpecificNumberVersionsFilter at the last if filter and versions are both present when initialize a scan at RS side. This way we do not need to modify the logic of ColumnTracker, the code is already complicated enough so do not put new stuffs to it. I still think the problem introduced by filter should also be addressed by filter. And the name 'setVersions' does not make sense, change it to 'readAllVersions'? And how do you deal with raw scan? Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058546#comment-16058546 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 48s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 6s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s {color} | {color:red} hbase-client in master has 4 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 59s {color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 32m 58s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 6s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s {color} | {color:red} hbase-client generated 5 new + 1 unchanged - 0 fixed = 6 total (was 1) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 46s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 42s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 99m 12s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestKeepDeletes | | | hadoop.hbase.regionserver.querymatcher.TestUserScanQueryMatcher | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12873971/HBASE-17125.master.checkReturnedVersions.patch | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 0c417fc24366 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 3489a1b | | Default Java
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058476#comment-16058476 ] Ted Yu commented on HBASE-17125: Can you put the latest patch on review board ? > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058430#comment-16058430 ] Guanghao Zhang commented on HBASE-17125: bq. If the filter decide to skip a version, then reduce the returned count in ColumnTracker. This method is too trick. And it is easy to have bug. So I upload a new patch (checkReturnedVersions.patch) which use the second idea in the description. It have three steps to match column. 1. check the column family's max versions. 2. check by filter 3. check the returned versions. (This can be set by user). About the setFilter()'s javadoc. It says "called AFTER all tests for ttl, column match, deletes and max versions have been run." Talked with [~yangzhe1991] and [~Apache9], we thought the max versions is easy to misunderstanding. Because the column family has a max versions config and user can set a max versions to scan. So in the new patch, I update the javadoc of setFilter() method. The new javadoc is "called AFTER all tests for ttl, column match, deletes and column family's max versions have been run". And add a new method setVersions() for scan, which means how many versions will be returned to user. And add a @deprecated mark for setMaxVersions() method. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.checkReturnedVersions.patch, > HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057556#comment-16057556 ] Steve Loughran commented on HBASE-17125: bq. Then why don't you implement it by yourself? If you think it is easy, then please implement it. Generally, as a committer it's actually more productive to nurture other developers into working towards what you believe to be the right answer than do it yourself. As well as sharing some of your unrealistic set of deliverables with others, you can be the reviewer to gets the stuff in, instead of having a patch you have chase other people to review. Long term: the more people you can you can get to collaborate helps the project all round. No opinions on the patch, just making sure everyone works together on this. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057381#comment-16057381 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 29s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 40s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 34s {color} | {color:red} hbase-server in master has 12 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 31m 9s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 124m 14s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 183m 11s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12873837/HBASE-17125.master.no-specified-filter.patch | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 4238af0c8af6 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 83be50c | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HBASE-Build/7276/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/7276/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/7276/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/7276/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7276/console | | Powered by | Apache Yetus 0.3.0
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057373#comment-16057373 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 59s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 40s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 15s {color} | {color:red} hbase-server in master has 12 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 31m 31s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 119m 48s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 183m 5s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12873837/HBASE-17125.master.no-specified-filter.patch | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux ffcf4b97ea45 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 83be50c | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HBASE-Build/7275/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/7275/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/7275/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/7275/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7275/console | | Powered by | Apache Yetus 0.3.0
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057276#comment-16057276 ] Ted Yu commented on HBASE-17125: When I ran TestImportTSVWithVisibilityLabels with the no-specified-filter patch, it hung. Please put HBASE-17125.master.no-specified-filter.patch on review board. Thanks > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057233#comment-16057233 ] Guanghao Zhang commented on HBASE-17125: bq. My request would be lets try that approach I attach a HBASE-17125.master.no-specified-filter.patch for it. The new approach still check versions first, then check by filter. And ColumnTracker use a returnedVersions to track how many versions will be returned. So checkVerisons will check max version by column family's versions. And the returned versions will controlled by user's scan's maxVersions. If the filter decide to skip a version, then reduce the returned count in ColumnTracker. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, > HBASE-17125.master.011.patch, HBASE-17125.master.no-specified-filter.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057199#comment-16057199 ] Guanghao Zhang commented on HBASE-17125: bq. Can you try it out once Guanghao Zhang? Ok. I will try to trigger the Hadoop QA again. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057017#comment-16057017 ] ramkrishna.s.vasudevan commented on HBASE-17125: I was following this discussion previously and lost track. I have seen this problem consistently arising. In all the different JIRAs related to this the fix inside the core was always having some side effects one way or the other. For the Visibility labels and ACL comment from Anoop, I think if we remove the Logic in VisiblityLabelFilter's filterKV to get the max number of versions and just add the new filter at the end in VisibiityController and set scan.setMaxVersion() all the visibilty tests should still pass. Can you try it out once [~zghaobac]? I think having a javadoc and a new filter for the user in case where he really wants multiple versions to be returned back even while filtering means he can try using the new filter provided (if it is covering all the cases) as the core side changes are minimal and most importantly bug free. IMHO - it is fine with me to add a new filter and expose it to user for those who need this filter + version specific usage. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056975#comment-16056975 ] Anoop Sam John commented on HBASE-17125: Agree Duo.. We say in javadoc when we will call filter. But if we can solve this issue in our code itself, that is great. Ur concern of more complexity also valid only.. My request would be lets try that approach (I dont mind who do :-) ) also and see.. How complex or not it is. Any perf impact or not. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056973#comment-16056973 ] Anoop Sam John commented on HBASE-17125: What Ted 2 reasons why I fear abt the way of asking new Filter usage.. I did not check his suggestion fully wrt code and side effects if any.. But IMHO we should try it out.. With out much perf penalty if we can do, why not! If the code itself can handle this that is always better and users will be much more happy(Than they have to remember and use another filter).. Lets be open for all possible ways. My request he would be pls see possible ways with which we can solve it on our own.. All agree here that the present situation is a bug and really bad behave from system. We never give back deterministic results. (All depends on other factors like compaction done or not etc) > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056932#comment-16056932 ] Duo Zhang commented on HBASE-17125: --- And I need to say again, we have never break the javadoc of setMaxVersions. If you do not use filtr, then the behavior is correct. If you use filter, then this is what the javadoc of setFilter says, filter is tested at last. {quote} /** * Apply the specified server-side filter when performing the Query. * Only {@link Filter#filterKeyValue(org.apache.hadoop.hbase.Cell)} is called AFTER all tests * for ttl, column match, deletes and max versions have been run. * @param filter filter to run on the server * @return this for invocation chaining */ {quote} Filter always introduces behaviors which are not intuitive because it is too flexible. For example, PageFilter may still return more rows than configured. You need to know the details of HBase if you want to use filter correctly. That's why I want to fix the problem in a simpler way rather than a complicated way. It does not make any big difference to users. For normal users they just do not care, and for advanced users they must know the implementation details. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056895#comment-16056895 ] Duo Zhang commented on HBASE-17125: --- We have already discussed many times on how to implement it, either in our company or here. {quote} which doesn't add much more complexity {quote} Then why don't you implement it by yourself? If you think it is easy, then please implement it. And your approach will lose the ability to control the versions passed to filter. And it can not be addressed by adding a new filter at the beginning, because you will always reset the version count if the filter list returns SKIP. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056871#comment-16056871 ] Ted Yu commented on HBASE-17125: The reasons Anoop and I don't favor SpecifiedNumVersionsColumnFilter are: * it is not intuitive. * user may use it incorrectly (conside nested FilterLists). Guanghao has agreed to try the new approach which doesn't add much more complexity on top of what is posted so far. Let's give him some time. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056860#comment-16056860 ] Duo Zhang commented on HBASE-17125: --- And we have already speak out at HBaseCon, there is no big concerns. I think most users just do not care about it as usually we only have one version, just like the user asked on mailing list. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056839#comment-16056839 ] Duo Zhang commented on HBASE-17125: --- {quote} Am still not in favor of asking the user to configure some extra Filter to get an expected behave from the system {quote} I'd say again that the javadoc never guarantee the current behavior and no doubt it is a broken semantic. And see my comment above, I think use another filter to address the problem introduced by filter is the right direction. We should not put too many complexities to our core system. And see my comment above, a real user case which shows that the current approach can solve his/her problem {quote} Oh, seems the user calls setMaxVerions to 1. I believe the problem is that he/she found that the filter will return old values then he/she use setMaxVersions(1) and hope this could solve the problem. So it is clear that in this user's mind, setMaxVersions should be used to control the number of versions passed to the filter. This is exactly what we provide in the latest patch. With the patch in place, the user does not need to call setMaxVersions(1) anymore. Thanks. {quote} > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056801#comment-16056801 ] Ted Yu commented on HBASE-17125: When I tested my tentative patch, TestImportTSVWithVisibilityLabels hung. TestImportTSVWithVisibilityLabels#testBulkOutputWithTsvImporterTextMapperWithInvalidLabels seems to be the last subtest running before I killed the surefire process. FYI > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056759#comment-16056759 ] Guanghao Zhang commented on HBASE-17125: bq. Here is snippet showing the concept of slack: bq. IMHO we should address this issue on our own (Like what Ted is trying to suggest here). Ok. Let me try it. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056042#comment-16056042 ] Anoop Sam John commented on HBASE-17125: IMHO we should address this issue on our own (Like what Ted is trying to suggest here). Am still not in favor of asking the user to configure some extra Filter to get an expected behave from the system. Can we pls think in that direction too. Ya ways were mentioned above also. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055765#comment-16055765 ] Ted Yu commented on HBASE-17125: Here is snippet showing the concept of slack: https://pastebin.com/diBC8B9M In UserScanQueryMatcher, around line 163, the call to retract() should be added: {code} switch (filterResponse) { case SKIP: if (colChecker == MatchCode.INCLUDE) { columns.retract(); return MatchCode.SKIP; {code} The above is tentative - certain unit tests don't pass yet. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055179#comment-16055179 ] Guanghao Zhang commented on HBASE-17125: The findbugs and whitespace were introduced by generated protobuf. And the failed tests were not rleated. They passed locally. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055141#comment-16055141 ] Guanghao Zhang commented on HBASE-17125: bq. which test in the patch exercises the above scenario ? The SpecifiedNumVersionsColumnFilter control how many versions column will be returned. Add a unit test TestFromClientSide#testSpecifiedNumVersionsColumnFilter for it. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054123#comment-16054123 ] Ted Yu commented on HBASE-17125: For my previous comment: https://issues.apache.org/jira/browse/HBASE-17125?focusedCommentId=15976972=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15976972 which test in the patch exercises the above scenario ? > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053863#comment-16053863 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 48s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 16m 58s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 45s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 26s {color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 12m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 32m 42s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 7s {color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} hbase-client generated 0 new + 1 unchanged - 1 fixed = 1 total (was 2) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s {color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 41s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 125m 7s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 54s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 220m 29s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.security.token.TestZKSecretWatcher | | | hadoop.hbase.master.procedure.TestMasterProcedureWalLease | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053587#comment-16053587 ] Guanghao Zhang commented on HBASE-17125: Add a 011 patch which rebase the latest master branch. [~anoop.hbase] [~stack] Any more concerns? If no objections, I will commit it tomorrow. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch, HBASE-17125.master.011.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006261#comment-16006261 ] Guanghao Zhang commented on HBASE-17125: Ping [~anoop.hbase] [~lhofhansl] [~stack]... > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996287#comment-15996287 ] Guanghao Zhang commented on HBASE-17125: [~anoop.hbase] Any more concerns? > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, > HBASE-17125.master.010.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996244#comment-15996244 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 42s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 53s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 34s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 13s {color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 34m 17s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s {color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 29s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 128m 59s {color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 59s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 217m 40s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12866316/HBASE-17125.master.010.patch | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile cc hbaseprotoc | | uname | Linux 860de2b1a771 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980756#comment-15980756 ] Duo Zhang commented on HBASE-17125: --- 'If not user filter,' -> 'If not use filter'. Any other concerns? [~anoop.hbase]. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978817#comment-15978817 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 37s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 33s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 42s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 34s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 15s {color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 33m 29s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 31s {color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 28s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 121m 53s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 51s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 208m 56s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.snapshot.TestMobExportSnapshot | | | hadoop.hbase.snapshot.TestExportSnapshot | | | hadoop.hbase.snapshot.TestMobSecureExportSnapshot | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12864479/HBASE-17125.master.009.patch | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile cc hbaseprotoc | | uname | Linux
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978806#comment-15978806 ] Chia-Ping Tsai commented on HBASE-17125: TestWalAndCompactingMemStoreFlush is traced by HBASE-17943. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978609#comment-15978609 ] Hadoop QA commented on HBASE-17125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 26s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 24s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 2s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 5s {color} | {color:red} hbase-protocol-shaded in master has 24 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 11m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 28m 38s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha2. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 112m 22s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 48s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 190m 27s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestWalAndCompactingMemStoreFlush | | | hadoop.hbase.snapshot.TestExportSnapshot | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12864458/HBASE-17125.master.009.patch | | JIRA Issue | HBASE-17125 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile cc hbaseprotoc | | uname | Linux 2f81ad5ac9e7 3.13.0-106-generic #153-Ubuntu SMP Tue
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978505#comment-15978505 ] Guanghao Zhang commented on HBASE-17125: Update the javadoc in the latest 009 patch. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch, HBASE-17125.master.009.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978373#comment-15978373 ] Duo Zhang commented on HBASE-17125: --- If not use filter, get up to the specified number of versions of each column. If use filter, it means the maximum versions of each column will be checked by filter. So the scan may return less than the value you set here as the filter may filter out some cells. If you want to get a specific number of version for each column after filtering, please call setMaxVersions() and use SpecifiedNumVersionsColumnFilter. Notice that the SpecifiedNumVersionsColumnFilter should be placed at the last position in FilterList to make sure it will be checked at last. Better? And seems get also has this method so you need to change the wording from scan to get if possible? Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, > HBASE-17125.master.009.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978169#comment-15978169 ] Guanghao Zhang commented on HBASE-17125: The right javadoc of setMaxVersions is: /** * If not use filter, get up to the specified number of versions of each column. * If use filter, it means the maximum versions of each column will be checked by filter. So the scan * maybe return less than maximum versions for each column. But you can add a SpecifiedNumVersionsColumnFilter * to get the specified number of versions of each column. * @param maxVersions maximum versions for each column * @return this */ > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978099#comment-15978099 ] Duo Zhang commented on HBASE-17125: --- Oh, seems the user calls setMaxVerions to 1. I believe the problem is that he/she found that the filter will return old values then he/she use setMaxVersions(1) and hope this could solve the problem. So it is clear that in this user's mind, setMaxVersions should be used to control the number of versions passed to the filter. This is exactly what we provide in the latest patch. With the patch in place, the user does not need to call setMaxVersions(1) anymore. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978098#comment-15978098 ] Guanghao Zhang commented on HBASE-17125: For the user mailing list case, the column's version is 1. So the user didn't need to setMaxVersions(). If the recent value is not matching, he will gets nothing. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978092#comment-15978092 ] Duo Zhang commented on HBASE-17125: --- {quote} So there also the user has to set this filter on get/scan and setMaxversions()? {quote} No. After the patch here the problem is gone. Just keep the code, everything will be OK. And the problem of this fix is that we can not use setMaxVerions to control the number of returned versions if you user filter(maybe). So we introduce a SpecifiedNumVersionsColumnFilter to solve the problem. Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978087#comment-15978087 ] Anoop Sam John commented on HBASE-17125: In user@ mailing list there was a query from a user regarding similar issue while using value filter. The recent value of a cell is not matching the value as per filter still he gets a older version of the cell.. He needs only one version (latest). So there also the user has to set this filter on get/scan and setMaxversions()? > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978085#comment-15978085 ] Duo Zhang commented on HBASE-17125: --- {quote} So setMaxVersions with any value >= 5 (not only 5), then the server can check all versions. {quote} Yes, just call scan.setMaxVerions(), do not need to give a specific value. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978080#comment-15978080 ] Guanghao Zhang commented on HBASE-17125: bq. It will so complicated for a user to set this as 5 and then the filter with 3. The default version is 1. So the scan will only check the latest version. The user need set a bigger value if he want read more than one version. This scenario: the column's versions is 5. So setMaxVersions with any value >= 5 (not only 5), then the server can check all versions. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978078#comment-15978078 ] Duo Zhang commented on HBASE-17125: --- Why users need to setMaxVerions to 5 if they only want 3 versions? The fact is, if you do not use filter, then just use setMaxVersions to control the number of the veraions returned. If you use filter, and then please use SpecifiedNumVersionsColumnFilter if you want to control the number of versions returned as the max versions will be tested before filter. The setMaxVersions is used to control the number of versions passed to filter. I think this is clear enough? Thanks. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978067#comment-15978067 ] Anoop Sam John commented on HBASE-17125: I dont think it is correct to ask users to do setMaxVersions(5). It will so complicated for a user to set this as 5 and then the filter with 3. This is like we pass our impl headache to user. IMHO. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15978000#comment-15978000 ] Guanghao Zhang commented on HBASE-17125: bq. If user calls scan#setMaxVersions(5), server would check more versions (than 3). However, there is a chance that more than 3 versions would be returned. This can be addressed by scan.setFilter(new SpecifiedNumVersionsColumnFilter(3)). setMaxVersions means how many version will be check. And SpecifiedNumVersionsColumnFilter means how many versions will be returned. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977993#comment-15977993 ] Ted Yu commented on HBASE-17125: Let's look at the scenario again: bq. if a column's max version is 5 and the user query only need 3 versions If user calls scan#setMaxVersions(5), server would check more versions (than 3). However, there is a chance that more than 3 versions would be returned. Instead of letting user deal with the slack, it would be better to handle this server side. My proposal only involves a few lines of change to your latest patch - though there may be some unit test failure(s). > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977981#comment-15977981 ] Guanghao Zhang commented on HBASE-17125: bq. Can we pass this information (let's call it the slack) to ColumnTracker ctor ? The javadoc of scan.setMaxVersions has been changed. If user set max versions is less than the column's max versions, it means user didn't want to check all versions. So I thought we don't need this information? > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977972#comment-15977972 ] Guanghao Zhang commented on HBASE-17125: bq. How is the above addressed in the current patch ? Now scan.setMaxVerrsions means how many versions will be check. So this can be addressed by scan.setMaxVersions(5). > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data
[ https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15976972#comment-15976972 ] Ted Yu commented on HBASE-17125: bq. has another problem, if a column's max version is 5 and the user query only need 3 versions. It first check the version's number, then check the cell by filter. So the cells number of the result may less than 3. But there are 2 versions which don't read anymore. How is the above addressed in the current patch ? Currently the max versions is obtained this way (see UserScanQueryMatcher): {code} int maxVersions = scan.isRaw() ? scan.getMaxVersions() : Math.min(scan.getMaxVersions(), scanInfo.getMaxVersions()); {code} The column tracker loses some information when column's max versions is greater than that specified in the Scan. Can we pass this information to ColumnTracker so that the column tracker can return richer information (thru a tuple, e.g.) from checkVersions() ? {code} colChecker = columns.checkVersions(cell, timestamp, typeByte, false); {code} That way, when filterResponse is SKIP, we can utilize the extra information to address the scenario described above. > Inconsistent result when use filter to read data > > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff, HBASE-17125.master.001.patch, > HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, > HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, > HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, > HBASE-17125.master.007.patch, HBASE-17125.master.008.patch > > > Assume a cloumn's max versions is 3, then we write 4 versions of this column. > The oldest version doesn't remove immediately. But from the user view, the > oldest version has gone. When user use a filter to query, if the filter skip > a new version, then the oldest version will be seen again. But after compact > the region, then the oldest version will never been seen. So it is weird for > user. The query will get inconsistent result before and after region > compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check the > cell by filter, then check the number of versions needed. So if the filter > skip the new version, then the oldest version will be seen again when it is > not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two > solution for this problem. The first idea is check the number of versions > first, then check the cell by filter. As the comment of setFilter, the filter > is called after all tests for ttl, column match, deletes and max versions > have been run. > {code} > /** >* Apply the specified server-side filter when performing the Query. >* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests >* for ttl, column match, deletes and max versions have been run. >* @param filter filter to run on the server >* @return this for invocation chaining >*/ > public Query setFilter(Filter filter) { > this.filter = filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the > user query only need 3 versions. It first check the version's number, then > check the cell by filter. So the cells number of the result may less than 3. > But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will break > the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)