thanks, but from http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#set it looks the params that can be 'set' is very limited, and does not contain the min split size and mapper count that I want
On Wed, Jan 11, 2012 at 9:52 PM, Dmitriy Ryaboy <[email protected]> wrote: > Yes, you can use the "set" keyword to set such properties in the script. > > On Jan 11, 2012, at 6:12 PM, Yang <[email protected]> wrote: > > > I have a pig script that does basically a map-only job: > > > > raw = LOAD 'input.txt' ; > > > > processed = FOREACH raw GENERATE convert_somehow($1,$2...); > > > > store processed into 'output.txt'; > > > > > > > > I have many nodes on my cluster, so I want PIG to process the input in > > more mappers. but it generates only 2 part-m-xxxxx files, i.e. > > using 2 mappers. > > > > in hadoop job it's possible to pass mapper count and > > -Dmapred.min.split.size= , would this also work for PIG? the PARALLEL > > keyword only works for reducers > > > > > > Thanks > > Yang >
