Question about sorted tables

2011-08-02 Thread Ajo Fod
Hello Hive Gurus, I am not sure if my system is using the sorting feature. In summary: - I expected to save time on the sorting step because I was using pre-sorted data, but the query plan seem to indicate an intermediate sorting step. === The Setup

Re: Hive too slow?

2011-03-11 Thread Ajo Fod
@hive.apache.org *Sent:* Tue, 8 March, 2011 11:47:20 AM *Subject:* Re: Hive too slow? Most likely, Hadoop's memory settings are too high and Linux starts swapping. You should be able to detect that too using vmstat. Just a guess. On Mon, Mar 7, 2011 at 10:11 PM, Ajo Fod ajo@gmail.com wrote

Re: Hive too slow?

2011-03-07 Thread Ajo Fod
In my experience, hive is not instantaneous like other DBs, but 4 minutes to count 2200 rows seems unreasonable. For comparison my query of 169k rows one one computer with 4 cores running 1Ghz (approx) took 20 seconds. Cheers, Ajo. On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak

Re: Hive too slow?

2011-03-07 Thread Ajo Fod
you point me at some places where i can get some info on how to tune this up? Regards, Abhishek -- *From:* Ajo Fod ajo@gmail.com *To:* user@hive.apache.org *Sent:* Mon, 7 March, 2011 9:21:51 PM *Subject:* Re: Hive too slow? In my experience, hive

Re: Stats Gathering Problems

2011-03-04 Thread Ajo Fod
The good news is that this is a simple XML section .. and this looks like a XML read error. Try to copy-paste one of the existing properties sections and pasting over just the name and value strings from the message. Cheers, Ajo On Fri, Mar 4, 2011 at 6:40 AM, Anja Gruenheid

Re: Trouble using mysql metastore

2011-03-02 Thread Ajo Fod
wrote: Usually this is caused by not having the mysql jdbc driver on the classpath (it's not default included in hive). Just put the mysql jdbc driver in the hive folder under lib/ On 03/02/2011 03:15 PM, Ajo Fod wrote: I've checked the mysql connection with a separate java file with the same

Re: Percent Rank Calculation

2011-03-01 Thread Ajo Fod
I know of this type of a call would give you a subset of the table ... also I think you can use a group by clause to get it for groups of data. SELECT PERCENTILE(val, 0.5) FROM pct_test WHERE val 100; Couldn't you use this call a few times to get the value for each percentile value? I think

Re: cannot start the transform script. reason : argument list too long

2011-03-01 Thread Ajo Fod
instead of using 'python2.6 user_id_output.py hbase' try something like this: using 'user_id_output.py' ... and a #! line with the location of the python binary. I think you can include a parameter too in the call like : using 'user_id_output.py hbase' Cheers, Ajo. On Tue, Mar 1, 2011 at

Re: calculating unique views based on ip, session_id

2011-02-21 Thread Ajo Fod
You can group by item_sid (drop session_id and ip_number from group by clause) and then join with the parent table to get session_id and ip_number. -Ajo On Mon, Feb 21, 2011 at 3:07 AM, Cam Bazz camb...@gmail.com wrote: Hello, So I have table of item views with item_sid, ip_number,

Re: Database/Schema , INTERVAL and SQL IN usages in Hive

2011-02-21 Thread Ajo Fod
On using SQL IN ... what would happen if you created a short table with the enteries in the IN clause and used a inner join ? -Ajo On Mon, Feb 21, 2011 at 7:57 AM, Bejoy Ks bejoy...@yahoo.com wrote: Thanks Jov for the quick response Could you please let me know which is the latest stable

Re: calculating unique views based on ip, session_id

2011-02-21 Thread Ajo Fod
camb...@gmail.com wrote: Hello, I did not understand this: when I do a: select item_sid, count(*) from item_raw group by item_sid i get hits per item. how do we join this to the master table? best regards, -c.b. On Mon, Feb 21, 2011 at 6:28 PM, Ajo Fod ajo@gmail.com wrote: You

Re: Importing a file wich includes delimiter like into HIVE

2011-02-14 Thread Ajo Fod
use delimited by | ... are you using this syntax: Are you saying that the syntax here not work for you? http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table ... if you tried this ... ccould it be that the error may be caused by something else. Cheers, -Ajo On Mon, Feb 14,

Re: how far can I go with a 1 node cluster

2011-02-14 Thread Ajo Fod
Yes, I've often wondered about asymmetric configurations. Is there a mechanism to prevent partition map/reduce jobs to be aware of differences between speeds of processors and allocate less work the the slower processors? To try to answer the question here: I have not had much experience with

Re: Loading files into tables

2011-02-01 Thread Ajo Fod
TABLE tablename_new SELECT * FROM tablename ... (kind of) So those LOCAL tables are kind of temporary. Amlan On Tue, Feb 1, 2011 at 6:51 PM, Ajo Fod ajo@gmail.com wrote: Look up for local : http://wiki.apache.org/hadoop/Hive/GettingStarted -Ajo. On Tue, Feb 1, 2011 at 3:15

Re: small files with hive and hadoop

2011-01-31 Thread Ajo Fod
I've noticed that it takes a while for each map job to be set up in hive ... and the way I set up the job I noticed that there were as many maps as files/buckets. I read a recommendation somewhere to design jobs such that they take at least a minute. Cheers, -Ajo. On Mon, Jan 31, 2011 at 8:08

On compressed storage : why are sequence files bigger than text files?

2011-01-18 Thread Ajo Fod
Hello, My questions in short are: - why are sequencefiles bigger than textfiles (considering that they are binary)? - It looks like compression does not make for a smaller sequence file than the original text file. -- here is a sample data that is transfered into the tables below with an INSERT

Re: On compressed storage : why are sequence files bigger than text files?

2011-01-18 Thread Ajo Fod
...@gmail.com wrote: On Tue, Jan 18, 2011 at 10:25 AM, Ajo Fod ajo@gmail.com wrote: I tried with the gzip compression codec. BTW, what do you think of bz2, I've read that it is possible to split as input to different mappers ... is there a catch? Here are my flags now ... of these the last

Re: partitioned column join does not work as expected

2011-01-18 Thread Ajo Fod
Can you try this with a dummy table with very few rows ... to see if the reason the script doesn't finish is a computational issue? One other thing is to try with a combined partition, to see if it is a problem with the partitioning. Also, take a look at the results of an EXPLAIN statement, see