Thanks for reply. I spent yesterday and find out my 40 minutes is spent on
JsonMetadta.findMetaFile. It seems this is new for trunk. In my setting, I have
several thousand file/folders in my input, findMetaFile read it one by one and
it takes a long time. I also see there is an option in PigStorage I can disable
it using "-noschema". Once I use "noschema", I get my 40 minutes back. Can we
do something so others do not get into this pitfall?
At 2011-12-30 03:52:34,"Dmitriy Ryaboy" <[email protected]> wrote:
>In the past, when I've observed this kind of insane behavior (no job should
>take 40 minutes to submit), it's been due the NameNode or the JobTracker
>being extremely overloaded, responding slowly, causing timeouts+retries.
>
>2011/12/28 Thejas Nair <[email protected]>
>
>> I haven't seen/heard this issue.
>> Do you mean to say that the extra time is actually a delay before MR job
>> is launched ?
>> Did you have free map/reduce slots when you ran pig job from trunk ?
>>
>> Thanks,
>> Thejas
>>
>>
>>
>>
>> On 12/23/11 9:01 PM, Yang Ling wrote:
>>
>>> I have a Pig job typically finish in 20 minutes. I tried Pig code from
>>> trunk, it takes more than 1 hours to finish. My input and output are on
>>> Amazon s3. One interesting thing is it takes about 40 minutes to start the
>>> mapreduce job, but for 0.9.1 release, it takes only less than 1 minute. Any
>>> idea?
>>>
>>
>>