[ https://issues.apache.org/jira/browse/PHOENIX-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bin Shi updated PHOENIX-4916: ----------------------------- Description: In DefaultStatisticsCollector.collectStatistics(...), it iterates all cells of the current row, once the accumulated estimated size plus the size of the current cell >= guide post width, it skips all the remaining cells. The result is that the estimated size of a guide post may only count part of cells of the last row. This problem can be ignored in clusters with real data where the guide post width is much bigger than the row size, but it does have impact on unit test and integration test, because we use very small guide post width in the test which results in inaccuracy of the estimated size of the query. was: In DefaultStatisticsCollector.collectStatistics(...), it iterate all cells of the current row, once the accumulated estimated size plus the size of the current cell >= guide post width, it skipped all the remaining cells. The result is that he estimated size of a guide post may only count part of cells of the last row. This problem can be ignored in clusters with real data where the guide post width is much bigger than the row size, but it does have impact on unit test and iteration test, because we use very small guide post width in the test which results in inaccuracy of the estimated size of the query. > When collecting statistics, the estimated size of a guide post may only count > part of cells of the last row > ----------------------------------------------------------------------------------------------------------- > > Key: PHOENIX-4916 > URL: https://issues.apache.org/jira/browse/PHOENIX-4916 > Project: Phoenix > Issue Type: Bug > Reporter: Bin Shi > Assignee: Bin Shi > Priority: Major > > In DefaultStatisticsCollector.collectStatistics(...), it iterates all cells > of the current row, once the accumulated estimated size plus the size of the > current cell >= guide post width, it skips all the remaining cells. The > result is that the estimated size of a guide post may only count part of > cells of the last row. > This problem can be ignored in clusters with real data where the guide post > width is much bigger than the row size, but it does have impact on unit test > and integration test, because we use very small guide post width in the test > which results in inaccuracy of the estimated size of the query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)