Patch available.. please test if that fixes the issue.
https://issues.apache.org/jira/browse/PIG-2453

On Sun, Jan 1, 2012 at 7:39 PM, Dmitriy Ryaboy <[email protected]> wrote:

> Yang, can you send the load statement you are using and a rought
> description of the directory structure you are loading? That'll help test
> the fix.
>
> Thanks,
> D
>
>
> On Sun, Jan 1, 2012 at 6:02 PM, Dmitriy Ryaboy <[email protected]> wrote:
>
>> Filed https://issues.apache.org/jira/browse/PIG-2453
>>
>>
>> On Sun, Jan 1, 2012 at 5:17 PM, Dmitriy Ryaboy <[email protected]>wrote:
>>
>>> Ah. That's unfortunate. Yeah reading thousands of files small is
>>> suboptimal (it's always suboptimal, but in this case, it's extra bad).
>>>
>>> Pig committers -- currently JsonMetadata.fiindMetaFile looks for a
>>> metadata file for each file.. what do you think about making it look at
>>> directories, instead?
>>>
>>> Yang -- what's the ratio between # of directories and # of files in your
>>> case?
>>>
>>> D
>>>
>>>
>>> On Sat, Dec 31, 2011 at 6:05 PM, Yang Ling <[email protected]>wrote:
>>>
>>>> Thanks for reply. I spent yesterday and find out my 40 minutes is spent
>>>> on  JsonMetadta.findMetaFile. It seems this is new for trunk. In my
>>>> setting, I have several thousand file/folders in my input, findMetaFile
>>>> read it one by one and it takes a long time. I also see there is an option
>>>> in PigStorage I can disable it using "-noschema". Once I use "noschema", I
>>>> get my 40 minutes back. Can we do something so others do not get into this
>>>> pitfall?
>>>> At 2011-12-30 03:52:34,"Dmitriy Ryaboy" <[email protected]> wrote:
>>>> >In the past, when I've observed this kind of insane behavior (no job
>>>> should
>>>> >take 40 minutes to submit), it's been due the NameNode or the
>>>> JobTracker
>>>> >being extremely overloaded, responding slowly, causing
>>>> timeouts+retries.
>>>> >
>>>> >2011/12/28 Thejas Nair <[email protected]>
>>>> >
>>>> >> I haven't seen/heard this issue.
>>>> >> Do you mean to say that the extra time is actually a delay before MR
>>>> job
>>>> >> is launched ?
>>>> >> Did you have free map/reduce slots when you ran pig job from trunk ?
>>>> >>
>>>> >> Thanks,
>>>> >> Thejas
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On 12/23/11 9:01 PM, Yang Ling wrote:
>>>> >>
>>>> >>> I have a Pig job typically finish in 20 minutes. I tried Pig code
>>>> from
>>>> >>> trunk, it takes more than 1 hours to finish. My input and output
>>>> are on
>>>> >>> Amazon s3. One interesting thing is it takes about 40 minutes to
>>>> start the
>>>> >>> mapreduce job, but for 0.9.1 release, it takes only less than 1
>>>> minute. Any
>>>> >>> idea?
>>>> >>>
>>>> >>
>>>> >>
>>>>
>>>>
>>>
>>
>

Reply via email to