I have a pig script that does basically a map-only job: raw = LOAD 'input.txt' ;
processed = FOREACH raw GENERATE convert_somehow($1,$2...); store processed into 'output.txt'; I have many nodes on my cluster, so I want PIG to process the input in more mappers. but it generates only 2 part-m-xxxxx files, i.e. using 2 mappers. in hadoop job it's possible to pass mapper count and -Dmapred.min.split.size= , would this also work for PIG? the PARALLEL keyword only works for reducers Thanks Yang
