Filed https://issues.apache.org/jira/browse/PIG-2453

On Sun, Jan 1, 2012 at 5:17 PM, Dmitriy Ryaboy <[email protected]> wrote:

> Ah. That's unfortunate. Yeah reading thousands of files small is
> suboptimal (it's always suboptimal, but in this case, it's extra bad).
>
> Pig committers -- currently JsonMetadata.fiindMetaFile looks for a
> metadata file for each file.. what do you think about making it look at
> directories, instead?
>
> Yang -- what's the ratio between # of directories and # of files in your
> case?
>
> D
>
>
> On Sat, Dec 31, 2011 at 6:05 PM, Yang Ling <[email protected]> wrote:
>
>> Thanks for reply. I spent yesterday and find out my 40 minutes is spent
>> on  JsonMetadta.findMetaFile. It seems this is new for trunk. In my
>> setting, I have several thousand file/folders in my input, findMetaFile
>> read it one by one and it takes a long time. I also see there is an option
>> in PigStorage I can disable it using "-noschema". Once I use "noschema", I
>> get my 40 minutes back. Can we do something so others do not get into this
>> pitfall?
>> At 2011-12-30 03:52:34,"Dmitriy Ryaboy" <[email protected]> wrote:
>> >In the past, when I've observed this kind of insane behavior (no job
>> should
>> >take 40 minutes to submit), it's been due the NameNode or the JobTracker
>> >being extremely overloaded, responding slowly, causing timeouts+retries.
>> >
>> >2011/12/28 Thejas Nair <[email protected]>
>> >
>> >> I haven't seen/heard this issue.
>> >> Do you mean to say that the extra time is actually a delay before MR
>> job
>> >> is launched ?
>> >> Did you have free map/reduce slots when you ran pig job from trunk ?
>> >>
>> >> Thanks,
>> >> Thejas
>> >>
>> >>
>> >>
>> >>
>> >> On 12/23/11 9:01 PM, Yang Ling wrote:
>> >>
>> >>> I have a Pig job typically finish in 20 minutes. I tried Pig code from
>> >>> trunk, it takes more than 1 hours to finish. My input and output are
>> on
>> >>> Amazon s3. One interesting thing is it takes about 40 minutes to
>> start the
>> >>> mapreduce job, but for 0.9.1 release, it takes only less than 1
>> minute. Any
>> >>> idea?
>> >>>
>> >>
>> >>
>>
>>
>

Reply via email to