Hive Error on medium sized dataset

Christopher, Pat Wed, 26 Jan 2011 17:48:33 -0800

Hi,
I'm attempting to load a small to medium sized log file, ~250MB, and produce 
some basic reports from it, counts etc.  Nothing fancy.  However, whenever I 
try and read the entire dataset, ~330k rows, I get the following error:


  FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask

This result gets produced with basic queries like:

  SELECT count(1) FROM medium_table;

However, if do the following:

  SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl;

It works okay until I get to around 70,800ish then I get the first error 
message again.  I'm running my HDFS system in single node, pseudo distributed 
mode with 1.5GB of memory and 20 GB of disk as a virtual machine.  And I am 
using a custom SerDe.  I don't think it's the SerDe but I'm open to suggestions 
for how I can check if it is causing the problem.  I can't see anything in the 
data that would be causing it though.

Anyone have any ideas of what might be causing this or something I can check?

Thanks,
Pat

Hive Error on medium sized dataset

Reply via email to