Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/8110
Change subject: IMPALA-5955: Use totalSize tblproperty instead of rawDataSize. ...................................................................... IMPALA-5955: Use totalSize tblproperty instead of rawDataSize. Today, Impala populates the 'rawDataSize' property during COMPUTE STATS for the purpose of extrapolating row counts based on file sizes. Intended meaning/use of tblproperties: - rawDataSize' is the estimated in-memory size of a table (without encoding and compression) - 'totalSize' represents the on-disk size Using the fields correctly is important for compatibility with other users of the HMS such as Hive and SparkSQL. For example, SparkSQL relies on the 'totalSize' for join ordering. Testing: - core/hdfs run passed Change-Id: If7c2c4e1e99b297c849f9f0d18b2bef34ad811c6 --- M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/planner/StatsExtrapolationTest.java M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test 4 files changed, 23 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/8110/1 -- To view, visit http://gerrit.cloudera.org:8080/8110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: If7c2c4e1e99b297c849f9f0d18b2bef34ad811c6 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <[email protected]>
