You are likely encountering a bug w/ Amazon's S3 code:
https://forums.aws.amazon.com/thread.jspa?threadID=56358&tstart=25

Try inserting into a non-S3 backed table to see if this is indeed your problem.

Based on the Amazon forums they are expected a fix this week:
https://forums.aws.amazon.com/thread.jspa?threadID=60149&tstart=0

Ryan

On 02/12/2011 11:08 PM, khassou...@mediumware.net wrote:
Hello,

I have the following table definition (simplified to help in debugging):

     create external table pvs (
       time INT,
       server STRING,
       thread_id STRING
     )
     partitioned by (
       dt string
     )
     row format delimited fields terminated by '\t'
     stored as textfile
     location 's3://dev-elastic/logz/';

I have another table raw_pvs that I want to import data from into pvs
using the following statement:

     INSERT OVERWRITE TABLE pvs PARTITION (dt)
     SELECT s.time, s.server, s.thread_id, s.dt
       FROM (
         FROM raw_pvs SELECT raw_pvs.time, raw_pvs.server,
raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
dt<'2011_01_02' limit 100
       ) s;

I keep getting the error

     Total MapReduce jobs = 1
     Launching Job 1 out of 1
     Number of reduce tasks determined at compile time: 1
     In order to change the average load for a reducer (in bytes):
       set hive.exec.reducers.bytes.per.reducer=<number>
     In order to limit the maximum number of reducers:
       set hive.exec.reducers.max=<number>
     In order to set a constant number of reducers:
       set mapred.reduce.tasks=<number>
     Starting Job = job_201102111900_0003, Tracking URL =
http://ip-10-204-190-203.ec2.internal:9100/jobdetails.jsp?jobid=job_201102111900_0003
     Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job
-Dmapred.job.tracker=ip-10-204-190-203.ec2.internal:9001 -kill
job_201102111900_0003
     2011-02-12 01:11:07,649 Stage-1 map = 0%,  reduce = 0%
     2011-02-12 01:11:09,733 Stage-1 map = 20%,  reduce = 0%
     2011-02-12 01:11:12,785 Stage-1 map = 100%,  reduce = 0%
     2011-02-12 01:11:18,868 Stage-1 map = 100%,  reduce = 100%
     Ended Job = job_201102111900_0003
     Loading data to table pvs partition (dt=null)
     Failed with exception dt not found in table's partition spec: {dt=null}
     FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask

When I run the sub query directly (FROM raw_pvs SELECT raw_pvs.time,
raw_pvs.server, raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
dt<'2011_01_02' limit 100) I get 100 rows with no null in them, so this
doesn't seem like a data issue.

Does anyone know what I'm doing wrong? I've been stuck on this for few days!!

thank you
Khaled


Reply via email to