Hello Hive Gurus,
I am not sure if my system is using the sorting feature.
In summary:
- I expected to save time on the sorting step because I was using
pre-sorted data, but the query plan seem to indicate an intermediate
sorting step.
=== The Setup
@hive.apache.org
*Sent:* Tue, 8 March, 2011 11:47:20 AM
*Subject:* Re: Hive too slow?
Most likely, Hadoop's memory settings are too high and Linux starts
swapping. You should be able to detect that too using vmstat.
Just a guess.
On Mon, Mar 7, 2011 at 10:11 PM, Ajo Fod ajo@gmail.com wrote
In my experience, hive is not instantaneous like other DBs, but 4 minutes to
count 2200 rows seems unreasonable.
For comparison my query of 169k rows one one computer with 4 cores running
1Ghz (approx) took 20 seconds.
Cheers,
Ajo.
On Mon, Mar 7, 2011 at 1:19 AM, abhishek pathak
you point me at some places
where i can get some info on how to tune this up?
Regards,
Abhishek
--
*From:* Ajo Fod ajo@gmail.com
*To:* user@hive.apache.org
*Sent:* Mon, 7 March, 2011 9:21:51 PM
*Subject:* Re: Hive too slow?
In my experience, hive
The good news is that this is a simple XML section .. and this looks like a
XML read error.
Try to copy-paste one of the existing properties sections and pasting over
just the name and value strings from the message.
Cheers,
Ajo
On Fri, Mar 4, 2011 at 6:40 AM, Anja Gruenheid
wrote:
Usually this is caused by not having the mysql jdbc driver on the
classpath (it's not default included in hive).
Just put the mysql jdbc driver in the hive folder under lib/
On 03/02/2011 03:15 PM, Ajo Fod wrote:
I've checked the mysql connection with a separate java file with the same
I know of this type of a call would give you a subset of the table ... also
I think you can use a group by clause to get it for groups of data.
SELECT PERCENTILE(val, 0.5) FROM pct_test WHERE val 100;
Couldn't you use this call a few times to get the value for each percentile
value?
I think
instead of
using 'python2.6 user_id_output.py hbase'
try something like this:
using 'user_id_output.py'
... and a #! line with the location of the python binary.
I think you can include a parameter too in the call like :
using 'user_id_output.py hbase'
Cheers,
Ajo.
On Tue, Mar 1, 2011 at
You can group by item_sid (drop session_id and ip_number from group by
clause) and then join with the parent table to get session_id and
ip_number.
-Ajo
On Mon, Feb 21, 2011 at 3:07 AM, Cam Bazz camb...@gmail.com wrote:
Hello,
So I have table of item views with item_sid, ip_number,
On using SQL IN ... what would happen if you created a short table with the
enteries in the IN clause and used a inner join ?
-Ajo
On Mon, Feb 21, 2011 at 7:57 AM, Bejoy Ks bejoy...@yahoo.com wrote:
Thanks Jov for the quick response
Could you please let me know which is the latest stable
camb...@gmail.com wrote:
Hello,
I did not understand this:
when I do a:
select item_sid, count(*) from item_raw group by item_sid
i get hits per item.
how do we join this to the master table?
best regards,
-c.b.
On Mon, Feb 21, 2011 at 6:28 PM, Ajo Fod ajo@gmail.com wrote:
You
use delimited by | ... are you using this syntax:
Are you saying that the syntax here not work for you?
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Create_Table
... if you tried this ... ccould it be that the error may be caused by
something else.
Cheers,
-Ajo
On Mon, Feb 14,
Yes, I've often wondered about asymmetric configurations. Is there a
mechanism to prevent partition map/reduce jobs to be aware of differences
between speeds of processors and allocate less work the the slower
processors?
To try to answer the question here: I have not had much experience with
TABLE tablename_new SELECT * FROM tablename ... (kind of)
So those LOCAL tables are kind of temporary.
Amlan
On Tue, Feb 1, 2011 at 6:51 PM, Ajo Fod ajo@gmail.com wrote:
Look up for local :
http://wiki.apache.org/hadoop/Hive/GettingStarted
-Ajo.
On Tue, Feb 1, 2011 at 3:15
I've noticed that it takes a while for each map job to be set up in hive ...
and the way I set up the job I noticed that there were as many maps as
files/buckets.
I read a recommendation somewhere to design jobs such that they take at
least a minute.
Cheers,
-Ajo.
On Mon, Jan 31, 2011 at 8:08
Hello,
My questions in short are:
- why are sequencefiles bigger than textfiles (considering that they
are binary)?
- It looks like compression does not make for a smaller sequence file
than the original text file.
-- here is a sample data that is transfered into the tables below with
an INSERT
...@gmail.com wrote:
On Tue, Jan 18, 2011 at 10:25 AM, Ajo Fod ajo@gmail.com wrote:
I tried with the gzip compression codec. BTW, what do you think of
bz2, I've read that it is possible to split as input to different
mappers ... is there a catch?
Here are my flags now ... of these the last
Can you try this with a dummy table with very few rows ... to see if
the reason the script doesn't finish is a computational issue?
One other thing is to try with a combined partition, to see if it is a
problem with the partitioning.
Also, take a look at the results of an EXPLAIN statement, see
18 matches
Mail list logo