Thanks Xiaomeng!

2011/5/31 Xiaomeng Wan <[email protected]>

> I asked a similar question before. Please see this thread
>
>
> http://mail-archives.apache.org/mod_mbox/pig-user/201103.mbox/%[email protected]%3E
>
> Shawn
>
> On Tue, May 31, 2011 at 11:08 AM, Jonathan Coveney <[email protected]>
> wrote:
> > Context: I have a bunch of files living in HDFS, and I think my jobs are
> > failing on one of them... I want to output the files that the job is
> failing
> > on.
> >
> > I thought that I could just make my own LoadFunc that followed the same
> > methodology as PigStorage, but caught exceptions and logged the file that
> > was given...this isn't working, however. I tried returning loadLocation,
> but
> > that is the globbed input, not the input to the mapper. I also tried
> reading
> > mapreduce.map.file.input and map.file.input from the Job given to
> > setLocation, but both were null... I think this is where some of my
> > ignorance as to pig's internal workings is coming into play, as I'm not
> sure
> > when files are deglobbed and the splits are actually read. I tried using
> > getLocations() from the PigSplit passed to prepareToRead but that was
> just
> > the glob as well...
> >
> > My next thought would be to read make a RecordReader that outputs the
> file
> > associated with its splits (as I assume that this should have to have the
> > specific files it is processing?), but I thought I'd ask if there was a
> > cleaner way before doing that...
> >
> > Thanks!
> > Jon
> >
>

Reply via email to