Re: FileAlreadyExistsException while running pig

Mohammad Tariq Fri, 10 Aug 2012 11:52:58 -0700

Hello Haitao,

    Each time we run a MapReduce job, the job expects the output to be
non-existent. If the output path is already there then
FileAlreadyExists  exception is thrown. And as we know that each Pig
job is eventually a MapReduce job, it also expects the same.


Regards,
    Mohammad Tariq


On Fri, Aug 10, 2012 at 11:18 PM, Alan Gates <[email protected]> wrote:
> Usually that means the the directory you are trying to store to already 
> exists.  Pig won't overwrite existing data.  You should either move or remove 
> the directory or change the directory name in your store function.
>
> Alan.
>
> On Aug 9, 2012, at 7:42 PM, Haitao Yao wrote:
>
>> hi, all
>>       I got this while running pig script:
>>
>> 997: Unable to recreate exception from backend error:
>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
>> hdfs://DC-hadoop01:9000/tmp/pig-temp/temp548500412/tmp-1456742965 already 
>> exists
>>        at 
>> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
>>        at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207)
>>        at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:188)
>>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:893)
>>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:415)
>>        at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
>>        at 
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:856)
>>        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:830)
>>        at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>>        at 
>> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>>        at 
>> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>>        at java.lang.Thread.run(Thread.java:722)
>>
>>
>> But I checked the script , the directory:  
>> hdfs://DC-hadoop01:9000/tmp/pig-temp/temp548500412/tmp-1456742965 is not 
>> used by the script explicitly, so I think it is used by the pig to store tmp 
>> results.
>> But why it exists? Isn't it unique?
>>
>>
>>
>>
>>
>>
>>
>>
>> Haitao Yao
>> [email protected]
>> weibo: @haitao_yao
>> Skype:  haitao.yao.final
>>
>

Re: FileAlreadyExistsException while running pig

Reply via email to