Great. Depend on the wiki:http://wiki.apache.org/pig/PigStorageWithInputPath and the setting:-Dpig.noSplitCombination=true, I can get the filename in the pig.
But I have another problem. I modify the UDF code and ant it and generate the newest jar file(I am sure the jar file has updated) pig -x local register /home/user/project/lib/myUDF.jar a = load 'aaa'; b = foreach a generate com.company.pig.myUDF(); dump b; I found that the result has been using the old jar file and UDF class, and I think UDF classes has been caced somewhere. Am I right? And how to using the really newest jar file after re-compile? Thanks very much. 2011/6/15 Daniel Dai <[email protected]> > Check http://wiki.apache.org/pig/PigStorageWithInputPath, also you will > need to disable split combination: -Dpig.noSplitCombination=true > > Daniel > > > On 06/13/2011 04:07 AM, Jameson Li wrote: > > Hi, > > I hava some files in the hdfs://path/load/ like this: > file_29_00001 > file_47_00001 > file_16_00001 > ... > These files are generate by other M/R jobs. The files are only contains one > column, and the number in the file name between 'file_' and '_00001' is a > id. > I want to add the id into its input format like this(I think I should to > write a LoadFunc to get the id): > a = load '/path/load/' as com.company.pig. > GetIDFromFileName(); > dump a; > //here the parameter 'a' will have two columns:one is the origin column and > the other is the id. > > And my question are these: > 1, Does there have the existing func that I can get the id from the file > name? > 2, I think the method in pig 0.6.0 can help me: > *bindTo<http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/builtin/PigStorage.html#bindTo(java.lang.String, > org.apache.pig.impl.io.BufferedPositionedInputStream, long, > long)> > <http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/builtin/PigStorage.html#bindTo(java.lang.String,org.apache.pig.impl.io.BufferedPositionedInputStream,long,long)>*(String<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html?is-external=true> > > <http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html?is-external=true> > fileName, > BufferedPositionedInputStream<http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/impl/io/BufferedPositionedInputStream.html> > > <http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/impl/io/BufferedPositionedInputStream.html> > > > in, > long offset, long end) > Specifies a portion of an InputStream to read tuples. > but I can't find the same method in pig 0.8.1. > Which method can I use to operate the input file in the pig 0.8.1 API? > > Thanks very much. > > >
