Hello Bharath Vissapragada, Dimitris Tsirogiannis,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8110
to look at the new patch set (#2).
Change subject: IMPALA-5955: Use totalSize tblproperty instead of rawDataSize.
......................................................................
IMPALA-5955: Use totalSize tblproperty instead of rawDataSize.
Today, Impala populates the 'rawDataSize' property
during COMPUTE STATS for the purpose of extrapolating
row counts based on file sizes.
After this patch Impala will populate 'totalSize' instead of
'rawDataSize'. The 'rawDataSize' is not populated or used.
Intended meaning/use of tblproperties:
- rawDataSize' is the estimated in-memory size of a table
(without encoding and compression)
- 'totalSize' represents the on-disk size
Using the fields correctly is important for compatibility
with other users of the HMS such as Hive and SparkSQL.
For example, SparkSQL relies on the 'totalSize' for
join ordering.
Testing:
- core/hdfs run passed
Change-Id: If7c2c4e1e99b297c849f9f0d18b2bef34ad811c6
---
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/planner/StatsExtrapolationTest.java
M
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
4 files changed, 25 insertions(+), 25 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/8110/2
--
To view, visit http://gerrit.cloudera.org:8080/8110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If7c2c4e1e99b297c849f9f0d18b2bef34ad811c6
Gerrit-Change-Number: 8110
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>