Re: computing median and percentiles

2014-03-19 Thread Seema Datar
Not really. If it was a single column with no counters, Hive provides an option to use percentile. So basically if the data was like - 100 100 200 200 200 200 300 But if we have 2 columns, one that maintain the value and the other that maintains the count, how can Hive be used to derive the per

Re: Issue with Querying External Hive Table created on hbase

2014-03-19 Thread Navis류승우
You can check the exact reason from job log, but generally, it's caused by missing libs in auxlib conf. Thar includes hive-hbase-handler.jar, hbase-*.jar, guava-*.jar, zookeeper-*.jar, etc. ,varying the version of your hive and hbase. Thanks, Navis 2014-03-20 3:42 GMT+09:00 Sunil Ranka : > Hi Al

Re: computing median and percentiles

2014-03-19 Thread Stephen Sprague
not a hive question is it? its more like a math question. On Wed, Mar 19, 2014 at 1:30 PM, Seema Datar wrote: > > > I understand the percentile function is supported in Hive in the latest > versions. However, how does once calculate percentiles when the data is > across two columns. So say

Re: computing median and percentiles

2014-03-19 Thread Seema Datar
I understand the percentile function is supported in Hive in the latest versions. However, how does once calculate percentiles when the data is across two columns. So say - Value Count 100 2 ( so basically 100 occurred twice) 200 4 300 1 400 6 500 3 I want to find out the 0.25 percentile

Re: org.apache.hadoop.hive.metastore.HiveMetaStoreClient with webhcat REST

2014-03-19 Thread Eugene Koifman
the URL to describe a table should be .../database//table/? but your exception happens before the URL problem. Have you checked templeton.hive.properties property in webhat-site.xml? does hive.metastore.uris there point at the right location for your metastore? templeton.libjars is translate

Issue with Querying External Hive Table created on hbase

2014-03-19 Thread Sunil Ranka
Hi All I am trying to query  External Hive Table created on hbase ( hbase table is compressed using "gzip") .  I am getting quick response, if I use "select * from hbase_acct_pref_dim_", but the query is taking for ever if I try to retrieve data based on the row_key.  hive> select * from hbase

Hive 13

2014-03-19 Thread Bryan Jeffrey
Hello. Is there a firm release date for Hive 13? I know there was talk several weeks ago about cutting a branch and looking at stability. Regards, Bryan Jeffrey

Re: Best way to avoid cross join

2014-03-19 Thread Nitin Pawar
from the mail thread's last line The correct fix would be to have 1 reducer in case of a Cartesian product hack to avoid this was (1 = 1). I think that's been taken care by the hash partitioner to go to single reducer. Other option (atleast for me) looks like to go PIG script. Never tried so yo

Re: Best way to avoid cross join

2014-03-19 Thread fab wol
Hey Nitin, Yong wrote exactly the oppsoite in his first sentence: *Cross join doesn't mean Hive has to use one reduce.* and this super old thread here lets me also assume that there can be used more than one reducer: http://mail-archives.apache.org/mod_mbox/hive-user/200904.mbox/%3ca132f89f9b9df

Attempt to archive a partition in hive fails

2014-03-19 Thread Rupinder Singh
Hi, I am trying to archive a partition in a hive table, but it keeps failing. Env: Hadoop 2.2.0, Amazon EMR, Hive 0.11.0 The sequence of commands in hive is here: hive> set hive.archive.enabled=true; hive> set hive.archive.har.parentdir.settable=true; hive> set har.partfile.size=1073741824; hive>

Re: Best way to avoid cross join

2014-03-19 Thread Nitin Pawar
hey Wolli, sorry missed this one. as Yong already replied, cross join always uses only one reducer. If you want to avoid this can you just try it to make full outer join with on condition (1 = 1) ? and see if you get your desired result On Wed, Mar 19, 2014 at 4:05 PM, fab wol wrote: > anyon

Re: Best way to avoid cross join

2014-03-19 Thread fab wol
anyone? still haven't solved this problem. Any help is appreciated. Cheers Wolli 2014-03-14 10:55 GMT+01:00 fab wol : > Hey Nitin, > > in import1 are at least 1.2 mio rows, with almost the same amount of > distinct id's and approxametly 40k distinct keywords. et_keywords contains > roundabout