Thanks Xiaomeng! 2011/5/31 Xiaomeng Wan <[email protected]>
> I asked a similar question before. Please see this thread > > > http://mail-archives.apache.org/mod_mbox/pig-user/201103.mbox/%[email protected]%3E > > Shawn > > On Tue, May 31, 2011 at 11:08 AM, Jonathan Coveney <[email protected]> > wrote: > > Context: I have a bunch of files living in HDFS, and I think my jobs are > > failing on one of them... I want to output the files that the job is > failing > > on. > > > > I thought that I could just make my own LoadFunc that followed the same > > methodology as PigStorage, but caught exceptions and logged the file that > > was given...this isn't working, however. I tried returning loadLocation, > but > > that is the globbed input, not the input to the mapper. I also tried > reading > > mapreduce.map.file.input and map.file.input from the Job given to > > setLocation, but both were null... I think this is where some of my > > ignorance as to pig's internal workings is coming into play, as I'm not > sure > > when files are deglobbed and the splits are actually read. I tried using > > getLocations() from the PigSplit passed to prepareToRead but that was > just > > the glob as well... > > > > My next thought would be to read make a RecordReader that outputs the > file > > associated with its splits (as I assume that this should have to have the > > specific files it is processing?), but I thought I'd ask if there was a > > cleaner way before doing that... > > > > Thanks! > > Jon > > >
