In general in Hive 2 you can get statistics for partitions by running:

hive> analyze table sales partition (year, month) compute statistics;
Partition oraclehadoop.sales{year=2000, month=10} stats: [numFiles=256,
numRows=21034, totalSize=1651890, rawDataSize=6226064]
Partition oraclehadoop.sales{year=1999, month=4} stats: [numFiles=256,
numRows=16512, totalSize=1533145, rawDataSize=4887552]
Partition oraclehadoop.sales{year=1999, month=8} stats: [numFiles=256,
numRows=22979, totalSize=1697346, rawDataSize=6801784]
Partition oraclehadoop.sales{year=2001, month=8} stats: [numFiles=256,
numRows=23879, totalSize=1744781, rawDataSize=7068184]
Partition oraclehadoop.sales{year=1998, month=2} stats: [numFiles=256,
numRows=14149, totalSize=1438496, rawDataSize=4188104]
Partition oraclehadoop.sales{year=1999, month=7} stats: [numFiles=256,
numRows=21648, totalSize=1657439, rawDataSize=6407808]
Partition oraclehadoop.sales{year=1999, month=5} stats: [numFiles=256,
numRows=19733, totalSize=1623643, rawDataSize=5840968]
Partition oraclehadoop.sales{year=1999, month=1} stats: [numFiles=256,
numRows=20637, totalSize=1638403, rawDataSize=6108552]

The partition statistics are shown for each partition as above.

Here not only there are partitions but also each partition is bucketed into
256 buckets.

Individual column stats can be obtained from the metadata table
part_col_stats

[image: Inline images 1]

If you are using ORC file, the statistics can be obtained from

 hive --orcfiledump --rowindex <FILE_PATH_ON_HDFS>


HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 18 August 2016 at 21:32, Gopal Vijayaraghavan <gop...@apache.org> wrote:

> > Is there any way to access the column statistics for the whole table?
>
> There's no column statistics for the whole table - the only way to get one
> is to merge all the partition column statistics.
>
> The metastore API actually exposes this (if you're looking for schema info
> to read in a program).
>
> https://hive.apache.org/javadocs/r2.0.1/api/org/
> apache/hadoop/hive/metastor
> e/api/ThriftHiveMetastore.Processor.get_aggr_stats_for.html
>
> +
> https://github.com/apache/hive/blob/master/itests/hive-
> unit/src/test/java/o
> rg/apache/hadoop/hive/metastore/hbase/TestHBaseAggrStatsCacheIntegra
> tion.ja
> va#L184
>
>
>
> Cheers,
> Gopal
>
>
>
>

Reply via email to