You just have to drop into the hadoop level to do this. Implement a
custom InputFormat / RecordReader; the record reader gets a normal
java stream.

D

On Fri, Jan 13, 2012 at 4:12 AM, Rory McCann <[email protected]> wrote:
> Hi all,
>
> I'm new to Pig (and a bit rusty with Java!) and still just playing
> around with it, nothing serious yet. I might be misunderstanding
> something important here.
>
> I'm trying to write a custom loader for a custom XML file format, i.e.
> deserialize the XML into Pig data type. However all the documentation
> and other code is based on taking a RecordReader and spitting out things
> from getNext().
>
> Is there anyway to make a custom loader that works on InputStreams or
> more common java-io-y type stuff? I'd like to use more commonly
> available XML parsers (which work on these). Since it's XML, line by
> line parsing doesn't really work. I will just have one input file that
> will be parsed. Is there some reason why there are no InputStreams?
>
> I have also asked this question on StackOverflow:
> http://stackoverflow.com/questions/8843790/custom-apache-pig-loadfunc-where-can-i-get-the-inputstream-on-the-file
>
> --
> Rory
>

Reply via email to