how to control the number of mappers?

Yang Wed, 11 Jan 2012 18:13:19 -0800

I have a pig script  that does basically a map-only job:

raw = LOAD 'input.txt' ;


processed = FOREACH raw GENERATE convert_somehow($1,$2...);

store processed into 'output.txt';



I have many nodes on my cluster, so I want PIG to process the input in
more mappers. but it generates only 2 part-m-xxxxx  files, i.e.
using 2 mappers.

in hadoop job it's possible to pass mapper count and
-Dmapred.min.split.size= ,  would this also work for PIG? the PARALLEL
keyword only works for reducers


Thanks
Yang

how to control the number of mappers?

Reply via email to