It sounds like you need to write your own recordReader (and associated inputFormat)
D On Tue, Jul 26, 2011 at 11:12 AM, <[email protected]> wrote: > Hello, > > I have a custom loader function to read in a parsed schema from some log > files, but it seems there is a problem with some of the log files and I need > to detect if the end of a line in the log does not end with '\n' or is EOF > when loading from the file. I'm currently running Pig 0.8.0 with Hadoop > 0.20.2, and I'm using the RecordReader class in my loader function to read > in from a text file in the following way: > > RecordReader in = null; > > public Tuple getNext() throws IOException { > try { > boolean notDone = in.nextKeyValue(); > if (!notDone) { > return null; > } > Text tval = (Text)in.getCurrentValue(); > String val = tval.toString(); > > However, there's no way using this method to check for '\n' or EOF in the > String val, so I'm not sure if it's possible to use another type of Record > Reader or some other method to check for these values. Any suggestions on > how to do this in a custom Pig loader function? >
