Hi Rob, (I'm not a Hive stats expert so please take my words with a grain of salt)
Do you mean distinct keys in aggregations? From what I can see Hive uses column stats (which include the # of distinct values for each column) from each source table involved in a query, and populate the stats for a plan based on some simple rules. It also assumes a uniform distribution for distinct values (I think). Best, Chao On Wed, Dec 21, 2016 at 9:28 AM, Haas, Nichole <nichole.h...@concur.com> wrote: > Hi Rob, > > I am not a developer, but I can tell you that to generate such statistics, > we had an intern work in spark all last summer. So, I don’t think it is > built into hive. > > Hope this helps you, > > ~Nikki > > > > *From: *Robert Grandl <rgra...@yahoo.com> > *Reply-To: *"user@hive.apache.org" <user@hive.apache.org>, Robert Grandl < > rgra...@yahoo.com> > *Date: *Wednesday, December 21, 2016 at 9:04 AM > *To: *User <user@hive.apache.org>, Dev <d...@hive.apache.org> > *Subject: *Re: Hive statistics > > > > Hi guys, > > > > I am wondering. Is there any other mailing list for hive related questions? > > > > I feel there is not much activity on the user/dev hive mailing lists or at > least not much support in answering my questions. > > > > Thanks, > > Robert > > > > On Tuesday, December 20, 2016 11:01 PM, Robert Grandl <rgra...@yahoo.com> > wrote: > > > > Hi guys, > > > > I am wondering if it's possible to estimate the number of distinct keys > and their distribution in a way or another. > > > > More concretely, for every stage, it is possible to determine the number > of distinct keys and for each key the number of values before the data is > actually processed? > > > > Thanks, > > Robert > > > > > > ------------------------------ > > This e-mail message is authorized for use by the intended recipient only > and may contain information that is privileged and confidential. If you > received this message in error, please call us immediately at (425) > 590-5000 and ask to speak to the message sender. Please do not copy, > disseminate, or retain this message unless you are the intended recipient. > In addition, to ensure the security of your data, please do not send any > unencrypted credit card or personally identifiable information to this > email address. Thank you. >