Hello Haitao,
Each time we run a MapReduce job, the job expects the output to be
non-existent. If the output path is already there then
FileAlreadyExists exception is thrown. And as we know that each Pig
job is eventually a MapReduce job, it also expects the same.
Regards,
Mohammad Tariq
On Fri, Aug 10, 2012 at 11:18 PM, Alan Gates <[email protected]> wrote:
> Usually that means the the directory you are trying to store to already
> exists. Pig won't overwrite existing data. You should either move or remove
> the directory or change the directory name in your store function.
>
> Alan.
>
> On Aug 9, 2012, at 7:42 PM, Haitao Yao wrote:
>
>> hi, all
>> I got this while running pig script:
>>
>> 997: Unable to recreate exception from backend error:
>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
>> hdfs://DC-hadoop01:9000/tmp/pig-temp/temp548500412/tmp-1456742965 already
>> exists
>> at
>> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207)
>> at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:188)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:893)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
>> at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:856)
>> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:830)
>> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>> at
>> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>> at
>> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>> at java.lang.Thread.run(Thread.java:722)
>>
>>
>> But I checked the script , the directory:
>> hdfs://DC-hadoop01:9000/tmp/pig-temp/temp548500412/tmp-1456742965 is not
>> used by the script explicitly, so I think it is used by the pig to store tmp
>> results.
>> But why it exists? Isn't it unique?
>>
>>
>>
>>
>>
>>
>>
>>
>> Haitao Yao
>> [email protected]
>> weibo: @haitao_yao
>> Skype: haitao.yao.final
>>
>