It will be tricky to clean up the data format as I'm operating on somewhat
arbitrary key-value pairs in part of the record. I will try and create
something similar though, might take a bit. Thanks.
I've tried resetting the heap size, I think. I added the following block to my
mapred-site.xml:
<property>
<name>mapred.child.java.opts</name>
<value>-Xm512M</value>
</property>
Is that how I'm supposed to do that?
Thanks,
Pat
From: hadoop n00b [mailto:[email protected]]
Sent: Wednesday, January 26, 2011 9:09 PM
To: [email protected]
Subject: Re: Hive Error on medium sized dataset
We typically get this error while running complex queries on our 4-node setup
when the child JVM runs out of heap size. Would be interested in what the
experts have to say about this error.
On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod
<[email protected]<mailto:[email protected]>> wrote:
Any chance you can convert the data to a tab separated text file and try the
same query?
It may not be the SerDe, but it may be good to isolate that away as a potential
source of the problem.
-Ajo.
On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I'm attempting to load a small to medium sized log file, ~250MB, and produce
some basic reports from it, counts etc. Nothing fancy. However, whenever I
try and read the entire dataset, ~330k rows, I get the following error:
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask
This result gets produced with basic queries like:
SELECT count(1) FROM medium_table;
However, if do the following:
SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;
It works okay until I get to around 70,800ish then I get the first error
message again. I'm running my HDFS system in single node, pseudo distributed
mode with 1.5GB of memory and 20 GB of disk as a virtual machine. And I am
using a custom SerDe. I don't think it's the SerDe but I'm open to suggestions
for how I can check if it is causing the problem. I can't see anything in the
data that would be causing it though.
Anyone have any ideas of what might be causing this or something I can check?
Thanks,
Pat