Re: Loader for small files

Something Something Mon, 11 Feb 2013 10:25:25 -0800

Sorry.. Moving 'hbase' mailing list to BCC 'cause this is not related to
HBase.  Adding 'hadoop' user group.


On Mon, Feb 11, 2013 at 10:22 AM, Something Something <
[email protected]> wrote:

> Hello,
>
> We are running into performance issues with Pig/Hadoop because our input
> files are small.  Everything goes to only 1 Mapper.  To get around this, we
> are trying to use our own Loader like this:
>
> 1)  Extend PigStorage:
>
> public class SmallFileStorage extends PigStorage {
>
>     public SmallFileStorage(String delimiter) {
>         super(delimiter);
>     }
>
>     @Override
>     public InputFormat getInputFormat() {
>         return new NLineInputFormat();
>     }
> }
>
>
>
> 2)  Add command line argument to the Pig command as follows:
>
> -Dmapreduce.input.lineinputformat.linespermap=500000
>
>
>
> 3)  Use SmallFileStorage in the Pig script as follows:
>
> USING com.xxx.yyy.SmallFileStorage ('\t')
>
>
> But this doesn't seem to work.  We still see that everything is going to
> one mapper.  Before we spend any more time on this, I am wondering if this
> is a good approach – OR – if there's a better approach?  Please let me
> know.  Thanks.
>
>
>

Re: Loader for small files

Reply via email to