Sorry.. Moving 'hbase' mailing list to BCC 'cause this is not related to HBase. Adding 'hadoop' user group.
On Mon, Feb 11, 2013 at 10:22 AM, Something Something < [email protected]> wrote: > Hello, > > We are running into performance issues with Pig/Hadoop because our input > files are small. Everything goes to only 1 Mapper. To get around this, we > are trying to use our own Loader like this: > > 1) Extend PigStorage: > > public class SmallFileStorage extends PigStorage { > > public SmallFileStorage(String delimiter) { > super(delimiter); > } > > @Override > public InputFormat getInputFormat() { > return new NLineInputFormat(); > } > } > > > > 2) Add command line argument to the Pig command as follows: > > -Dmapreduce.input.lineinputformat.linespermap=500000 > > > > 3) Use SmallFileStorage in the Pig script as follows: > > USING com.xxx.yyy.SmallFileStorage ('\t') > > > But this doesn't seem to work. We still see that everything is going to > one mapper. Before we spend any more time on this, I am wondering if this > is a good approach – OR – if there's a better approach? Please let me > know. Thanks. > > >
