Re: Dynamic partition set to null

khassounah Sun, 13 Feb 2011 09:12:28 -0800

Thanks Ryan... that does seem to be my issue.

I found the first thread after I sent this email, but not the second
thread saying it will be fixed next week.


thanks
Khaled

> You are likely encountering a bug w/ Amazon's S3 code:
> https://forums.aws.amazon.com/thread.jspa?threadID=56358&tstart=25
>
> Try inserting into a non-S3 backed table to see if this is indeed your
> problem.
>
> Based on the Amazon forums they are expected a fix this week:
> https://forums.aws.amazon.com/thread.jspa?threadID=60149&tstart=0
>
> Ryan
>
> On 02/12/2011 11:08 PM, khassou...@mediumware.net wrote:
>> Hello,
>>
>> I have the following table definition (simplified to help in debugging):
>>
>>      create external table pvs (
>>        time INT,
>>        server STRING,
>>        thread_id STRING
>>      )
>>      partitioned by (
>>        dt string
>>      )
>>      row format delimited fields terminated by '\t'
>>      stored as textfile
>>      location 's3://dev-elastic/logz/';
>>
>> I have another table raw_pvs that I want to import data from into pvs
>> using the following statement:
>>
>>      INSERT OVERWRITE TABLE pvs PARTITION (dt)
>>      SELECT s.time, s.server, s.thread_id, s.dt
>>        FROM (
>>          FROM raw_pvs SELECT raw_pvs.time, raw_pvs.server,
>> raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
>> dt<'2011_01_02' limit 100
>>        ) s;
>>
>> I keep getting the error
>>
>>      Total MapReduce jobs = 1
>>      Launching Job 1 out of 1
>>      Number of reduce tasks determined at compile time: 1
>>      In order to change the average load for a reducer (in bytes):
>>        set hive.exec.reducers.bytes.per.reducer=<number>
>>      In order to limit the maximum number of reducers:
>>        set hive.exec.reducers.max=<number>
>>      In order to set a constant number of reducers:
>>        set mapred.reduce.tasks=<number>
>>      Starting Job = job_201102111900_0003, Tracking URL =
>> http://ip-10-204-190-203.ec2.internal:9100/jobdetails.jsp?jobid=job_201102111900_0003
>>      Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job
>> -Dmapred.job.tracker=ip-10-204-190-203.ec2.internal:9001 -kill
>> job_201102111900_0003
>>      2011-02-12 01:11:07,649 Stage-1 map = 0%,  reduce = 0%
>>      2011-02-12 01:11:09,733 Stage-1 map = 20%,  reduce = 0%
>>      2011-02-12 01:11:12,785 Stage-1 map = 100%,  reduce = 0%
>>      2011-02-12 01:11:18,868 Stage-1 map = 100%,  reduce = 100%
>>      Ended Job = job_201102111900_0003
>>      Loading data to table pvs partition (dt=null)
>>      Failed with exception dt not found in table's partition spec:
>> {dt=null}
>>      FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.MoveTask
>>
>> When I run the sub query directly (FROM raw_pvs SELECT raw_pvs.time,
>> raw_pvs.server, raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
>> dt<'2011_01_02' limit 100) I get 100 rows with no null in them, so this
>> doesn't seem like a data issue.
>>
>> Does anyone know what I'm doing wrong? I've been stuck on this for few
>> days!!
>>
>> thank you
>> Khaled
>>
>
>

Re: Dynamic partition set to null

Reply via email to