GitHub user wzhfy opened a pull request:
https://github.com/apache/spark/pull/19880
[SPARK-22626][SQL][FOLLOWUP] improve documentation and simplify test case
## What changes were proposed in this pull request?
The reason why some Hive tables have `numRows` statistics is that, in Hive,
when stats gathering is disabled, `numRows` is always zero after INSERT command:
```
hive> create table src (key int, value string) stored as orc;
hive> desc formatted src;
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
numFiles 0
numRows 0
rawDataSize 0
totalSize 0
transient_lastDdlTime 1512399590
hive> set hive.stats.autogather=false;
hive> insert into src select 1, 'a';
hive> desc formatted src;
Table Parameters:
numFiles 1
numRows 0
rawDataSize 0
totalSize 275
transient_lastDdlTime 1512399647
hive> insert into src select 1, 'b';
hive> desc formatted src;
Table Parameters:
numFiles 2
numRows 0
rawDataSize 0
totalSize 550
transient_lastDdlTime 1512399687
```
## How was this patch tested?
Modified existing test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wzhfy/spark doc_zero_rowCount
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19880.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19880
----
commit 9be829d208f7e2d6a88b9d2008fc04eec4a4ad8e
Author: Zhenhua Wang <[email protected]>
Date: 2017-12-04T15:53:49Z
improve doc
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]