Hi, I hava some files in the hdfs://path/load/ like this: file_29_00001 file_47_00001 file_16_00001 ... These files are generate by other M/R jobs. The files are only contains one column, and the number in the file name between 'file_' and '_00001' is a id. I want to add the id into its input format like this(I think I should to write a LoadFunc to get the id): a = load '/path/load/' as com.company.pig.GetIDFromFileName(); dump a; //here the parameter 'a' will have two columns:one is the origin column and the other is the id.
And my question are these: 1, Does there have the existing func that I can get the id from the file name? 2, I think the method in pig 0.6.0 can help me: *bindTo<http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/builtin/PigStorage.html#bindTo(java.lang.String, org.apache.pig.impl.io.BufferedPositionedInputStream, long, long)>*(String<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html?is-external=true> fileName, BufferedPositionedInputStream<http://pig.apache.org/docs/r0.6.0/api/org/apache/pig/impl/io/BufferedPositionedInputStream.html> in, long offset, long end) Specifies a portion of an InputStream to read tuples. but I can't find the same method in pig 0.8.1. Which method can I use to operate the input file in the pig 0.8.1 API? Thanks very much.
