It is the number of non-null values. The "and repeated values" is incorrect
and should be fixed.

.. Owen

On Wed, Apr 6, 2016 at 11:28 AM, Dave Birdsall <[email protected]>
wrote:

> Hi,
>
>
>
> I have a question about the getNumberOfValues() method of the
> ColumnStatistics interface.
>
>
>
> In the Hive documentation (for example, here:
> https://hive.apache.org/javadocs/r0.12.0/api/org/apache/hadoop/hive/ql/io/orc/ColumnStatistics.html),
> the method is described as returning “the number of values in this column”.
> Under Method Detail, it says, “it will differ from the number of rows
> because of NULL values and repeated values.”
>
>
>
> My question concerns “repeated values”.
>
>
>
> Being an SQL guy, I leap to the conclusion that getNumberOfValues()
> returns the equivalent of “select count(distinct column) from orc_table”,
> that is, the number of distinct values for that column in the table. (Well,
> for ORC it is for a particular stripe of the table, but I hope my meaning
> gets across.)
>
>
>
> But when I experiment with this API, it seems to be returning the number
> of non-null values instead. For example, using the Trafodion SQL engine to
> query an example Hive table using ORC files, I see:
>
>
>
> >>select s_rec_end_date from hive.hive.store2_orc order by s_rec_end_date;
>
>
>
> S_REC_END_DATE
>
> --------------
>
>
>
>     1999-03-13
>
>     1999-03-13
>
>     2000-03-12
>
>     2000-03-12
>
>     2001-03-12
>
>     2001-03-12
>
> ?
>
> ?
>
> ?
>
> ?
>
> ?
>
> ?
>
>
>
> --- 12 row(s) selected.
>
>
>
> But when I look at what ColumnStatistics.getNumberOfValues() returns for
> this column, I get 6. (This particular example table has just one stripe.)
> Looking at the values, though, there are just 3 distinct values here.
>
>
>
> So, my question is: Is it the case that
> ColumnStatistics.getNumberOfValues() returns the number of non-null values
> in a column (in a given stripe)? And the Hive documentation is incorrect
> when it mentions “repeated values”?
>
>
>
> Thanks,
>
>
>
> Dave
>

Reply via email to