Hi Rob,

(I'm not a Hive stats expert so please take my words with a grain of salt)

Do you mean distinct keys in aggregations? From what I can see Hive uses
column stats (which include the # of distinct values for each column) from
each source
table involved in a query, and populate the stats for a plan based on some
simple rules. It also assumes a uniform distribution for distinct values (I
think).

Best,
Chao

On Wed, Dec 21, 2016 at 9:28 AM, Haas, Nichole <nichole.h...@concur.com>
wrote:

> Hi Rob,
>
> I am not a developer, but I can tell you that to generate such statistics,
> we had an intern work in spark all last summer.  So, I don’t think it is
> built into hive.
>
> Hope this helps you,
>
> ~Nikki
>
>
>
> *From: *Robert Grandl <rgra...@yahoo.com>
> *Reply-To: *"user@hive.apache.org" <user@hive.apache.org>, Robert Grandl <
> rgra...@yahoo.com>
> *Date: *Wednesday, December 21, 2016 at 9:04 AM
> *To: *User <user@hive.apache.org>, Dev <d...@hive.apache.org>
> *Subject: *Re: Hive statistics
>
>
>
> Hi guys,
>
>
>
> I am wondering. Is there any other mailing list for hive related questions?
>
>
>
> I feel there is not much activity on the user/dev hive mailing lists or at
> least not much support in answering my questions.
>
>
>
> Thanks,
>
> Robert
>
>
>
> On Tuesday, December 20, 2016 11:01 PM, Robert Grandl <rgra...@yahoo.com>
> wrote:
>
>
>
> Hi guys,
>
>
>
> I am wondering if it's possible to estimate the number of distinct keys
> and their distribution in a way or another.
>
>
>
> More concretely, for every stage, it is possible to determine the number
> of distinct keys and for each key the number of values  before the data is
> actually processed?
>
>
>
> Thanks,
>
> Robert
>
>
>
>
>
> ------------------------------
>
> This e-mail message is authorized for use by the intended recipient only
> and may contain information that is privileged and confidential. If you
> received this message in error, please call us immediately at (425)
> 590-5000 and ask to speak to the message sender. Please do not copy,
> disseminate, or retain this message unless you are the intended recipient.
> In addition, to ensure the security of your data, please do not send any
> unencrypted credit card or personally identifiable information to this
> email address. Thank you.
>

Reply via email to