thanks, but from http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#set
it looks the params that can be 'set' is very limited, and does not contain
the min split size  and mapper count that I want



On Wed, Jan 11, 2012 at 9:52 PM, Dmitriy Ryaboy <[email protected]> wrote:

> Yes, you can use the "set" keyword to set such properties in the script.
>
> On Jan 11, 2012, at 6:12 PM, Yang <[email protected]> wrote:
>
> > I have a pig script  that does basically a map-only job:
> >
> > raw = LOAD 'input.txt' ;
> >
> > processed = FOREACH raw GENERATE convert_somehow($1,$2...);
> >
> > store processed into 'output.txt';
> >
> >
> >
> > I have many nodes on my cluster, so I want PIG to process the input in
> > more mappers. but it generates only 2 part-m-xxxxx  files, i.e.
> > using 2 mappers.
> >
> > in hadoop job it's possible to pass mapper count and
> > -Dmapred.min.split.size= ,  would this also work for PIG? the PARALLEL
> > keyword only works for reducers
> >
> >
> > Thanks
> > Yang
>

Reply via email to