Thanks Ryan... that does seem to be my issue. I found the first thread after I sent this email, but not the second thread saying it will be fixed next week.
thanks Khaled > You are likely encountering a bug w/ Amazon's S3 code: > https://forums.aws.amazon.com/thread.jspa?threadID=56358&tstart=25 > > Try inserting into a non-S3 backed table to see if this is indeed your > problem. > > Based on the Amazon forums they are expected a fix this week: > https://forums.aws.amazon.com/thread.jspa?threadID=60149&tstart=0 > > Ryan > > On 02/12/2011 11:08 PM, khassou...@mediumware.net wrote: >> Hello, >> >> I have the following table definition (simplified to help in debugging): >> >> create external table pvs ( >> time INT, >> server STRING, >> thread_id STRING >> ) >> partitioned by ( >> dt string >> ) >> row format delimited fields terminated by '\t' >> stored as textfile >> location 's3://dev-elastic/logz/'; >> >> I have another table raw_pvs that I want to import data from into pvs >> using the following statement: >> >> INSERT OVERWRITE TABLE pvs PARTITION (dt) >> SELECT s.time, s.server, s.thread_id, s.dt >> FROM ( >> FROM raw_pvs SELECT raw_pvs.time, raw_pvs.server, >> raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and >> dt<'2011_01_02' limit 100 >> ) s; >> >> I keep getting the error >> >> Total MapReduce jobs = 1 >> Launching Job 1 out of 1 >> Number of reduce tasks determined at compile time: 1 >> In order to change the average load for a reducer (in bytes): >> set hive.exec.reducers.bytes.per.reducer=<number> >> In order to limit the maximum number of reducers: >> set hive.exec.reducers.max=<number> >> In order to set a constant number of reducers: >> set mapred.reduce.tasks=<number> >> Starting Job = job_201102111900_0003, Tracking URL = >> http://ip-10-204-190-203.ec2.internal:9100/jobdetails.jsp?jobid=job_201102111900_0003 >> Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job >> -Dmapred.job.tracker=ip-10-204-190-203.ec2.internal:9001 -kill >> job_201102111900_0003 >> 2011-02-12 01:11:07,649 Stage-1 map = 0%, reduce = 0% >> 2011-02-12 01:11:09,733 Stage-1 map = 20%, reduce = 0% >> 2011-02-12 01:11:12,785 Stage-1 map = 100%, reduce = 0% >> 2011-02-12 01:11:18,868 Stage-1 map = 100%, reduce = 100% >> Ended Job = job_201102111900_0003 >> Loading data to table pvs partition (dt=null) >> Failed with exception dt not found in table's partition spec: >> {dt=null} >> FAILED: Execution Error, return code 1 from >> org.apache.hadoop.hive.ql.exec.MoveTask >> >> When I run the sub query directly (FROM raw_pvs SELECT raw_pvs.time, >> raw_pvs.server, raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and >> dt<'2011_01_02' limit 100) I get 100 rows with no null in them, so this >> doesn't seem like a data issue. >> >> Does anyone know what I'm doing wrong? I've been stuck on this for few >> days!! >> >> thank you >> Khaled >> > >