[ 
https://issues.apache.org/jira/browse/PHOENIX-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi updated PHOENIX-4916:
-----------------------------
    Description: 
In DefaultStatisticsCollector.collectStatistics(...), it iterates all cells of 
the current row, once the accumulated estimated size plus the size of the 
current cell >= guide post width, it skips all the remaining cells. The result 
is that the estimated size of a guide post may only count part of cells of the 
last row.

This problem can be ignored in clusters with real data where the guide post 
width is much bigger than the row size, but it does have impact on unit test 
and integration test, because we use very small guide post width in the test 
which results in inaccuracy of the estimated size of the query.

  was:
In DefaultStatisticsCollector.collectStatistics(...), it iterate all cells of 
the current row, once the accumulated estimated size plus the size of the 
current cell >= guide post width, it skipped all the remaining cells. The 
result is that  he estimated size of a guide post may only count part of cells 
of the last row.

This problem can be ignored in clusters with real data where the guide post 
width is much bigger than the row size, but it does have impact on unit test 
and iteration test, because we use very small guide post width in the test 
which results in inaccuracy of the estimated size of the query.


> When collecting statistics, the estimated size of a guide post may only count 
> part of cells of the last row
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4916
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4916
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Bin Shi
>            Assignee: Bin Shi
>            Priority: Major
>
> In DefaultStatisticsCollector.collectStatistics(...), it iterates all cells 
> of the current row, once the accumulated estimated size plus the size of the 
> current cell >= guide post width, it skips all the remaining cells. The 
> result is that the estimated size of a guide post may only count part of 
> cells of the last row.
> This problem can be ignored in clusters with real data where the guide post 
> width is much bigger than the row size, but it does have impact on unit test 
> and integration test, because we use very small guide post width in the test 
> which results in inaccuracy of the estimated size of the query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to