Daniel,

Thank you very much for your answer with a concrete example. It did
solve our problem!

Thanks again,

Sang


On Tue, Nov 23, 2010 at 1:27 PM, Daniel Dai <[email protected]> wrote:
> I remember we did something similar before. FileSplit.getPath() does have a
> hold of file name.
>
> Here is a sample code:
>
> public class PigStorageWithInputPath extends PigStorage {
>   Path path = null;
>
>   @Override
>   public void prepareToRead(RecordReader reader, PigSplit split) {
>       super.prepareToRead(reader, split);
>       path = ((FileSplit)split.getWrappedSplit()).getPath();
>   }
>
>   @Override
>   public Tuple getNext() throws IOException {
>       Tuple myTuple = super.getNext();
>       if (myTuple != null)
>           myTuple.append(path.toString());
>       return myTuple;
>   }
> }
>
>
> Does it solves your problem?
>
> Daniel
>
> Sangchul Song wrote:
>>
>> Hi all,
>>
>> Our dataset consists of multiple files. The name of each file reflects
>> the creation date of the file. (e.g. 20101031.dat, 20101101.dat, etc)
>> We need this date information for all relations inside the file, but
>> there is no date field.
>>
>> We first considered the possibility of accessing the file name through
>> a UDF that implements LoadFunc, but it doesn't appear to be possible.
>> In particular, 'location' in setLocation(String location, PigSplit
>> split) only gives the original glob expression used in LOAD (such as
>> '/test/data/*.dat'), and 'reader' in prepareToRead(RecordReader
>> reader, PigSplit split) doesn't expose a method for file name access.
>>
>> Before we individually add the date field to every single file (which
>> we want to leave as the last resort, considering the number of files
>> we deal with), we were wondering if there's any way to access the file
>> name within a pig script (including UDFs) especially when you load
>> multiple files at the same time. Any help would be greatly
>> appreciated.
>>
>> FYI, we are on Pig 0.7.0 running on top of Hadoop 0.20.2
>>
>> Thanks,
>>
>> Sang
>>
>
>

Reply via email to