Have you looked at SequenceFileLoader for Pig? http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html
Regards, Shahab On Wed, Oct 23, 2013 at 3:30 PM, Sameer Tilak <ssti...@live.com> wrote: > Hi There, > > I have a lot of small (~0.5 MB to 3 MB) XML files that I would like to > process using Apache Pig. Since dealing with a lot of small files is > problematic , I was thinking of creating SeqeunceFiles such that each > sequence file between 60 to 64 MB and no XML file is split onto 2 Sequence > Files. Is there any utility that does the storing and loading of these > files from Pig. I can for example create a Pig job that would read these > XML files and generates few large sequence files such that XML file is > split onto 2 Sequence Files. I will then write another Pig job that will > load these sequence files and then analyze them. Each of these XML files > contains a lot of information for a given entity and the nesting can be > quite deep. Any help with this would be great. > >