Hello,

I have the following table definition (simplified to help in debugging):

    create external table pvs (
      time INT,
      server STRING,
      thread_id STRING
    )
    partitioned by (
      dt string
    )
    row format delimited fields terminated by '\t'
    stored as textfile
    location 's3://dev-elastic/logz/';

I have another table raw_pvs that I want to import data from into pvs
using the following statement:

    INSERT OVERWRITE TABLE pvs PARTITION (dt)
    SELECT s.time, s.server, s.thread_id, s.dt
      FROM (
        FROM raw_pvs SELECT raw_pvs.time, raw_pvs.server,
raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
dt<'2011_01_02' limit 100
      ) s;

I keep getting the error

    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapred.reduce.tasks=<number>
    Starting Job = job_201102111900_0003, Tracking URL =
http://ip-10-204-190-203.ec2.internal:9100/jobdetails.jsp?jobid=job_201102111900_0003
    Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job 
-Dmapred.job.tracker=ip-10-204-190-203.ec2.internal:9001 -kill
job_201102111900_0003
    2011-02-12 01:11:07,649 Stage-1 map = 0%,  reduce = 0%
    2011-02-12 01:11:09,733 Stage-1 map = 20%,  reduce = 0%
    2011-02-12 01:11:12,785 Stage-1 map = 100%,  reduce = 0%
    2011-02-12 01:11:18,868 Stage-1 map = 100%,  reduce = 100%
    Ended Job = job_201102111900_0003
    Loading data to table pvs partition (dt=null)
    Failed with exception dt not found in table's partition spec: {dt=null}
    FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask

When I run the sub query directly (FROM raw_pvs SELECT raw_pvs.time,
raw_pvs.server, raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
dt<'2011_01_02' limit 100) I get 100 rows with no null in them, so this
doesn't seem like a data issue.

Does anyone know what I'm doing wrong? I've been stuck on this for few days!!

thank you
Khaled

Reply via email to