What Harsh means by that is, you should create a custom partitioner which should take care of partitioning the records based on the input record data (Key, Value). i.e., if you have multiple inputs from multiple mappers each might generate a key, value pair you should have something specific in your key/value which can be useful to figure out, that which dataset it is coming from (if your value is a Text, then value dataset1+value, dataset2+value etc). Using this info in your partitioner you can either write mulitple Partitioner implementations or simply one partitioner handling all different cases.
Harsh, please correct me if I am wrong. Best, Mahesh Balija, Calsoft Labs. On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav <[email protected]>wrote: > Thank You for reply > > Can u please elaborate because i am not getting wht does following means > in programming enviornment > > > you will need a custom written "high level" partitioner and combiner that > can create multiple instances of sub-partitioners/combiners and use the > most likely one based on their input's characteristics (such as instance > type, some tag, config., etc.). > > > > On Sun, Mar 3, 2013 at 4:58 PM, Harsh J <[email protected]> wrote: > >> The MultipleInputs class only supports mapper configuration per dataset. >> It does not let you specify a partitioner and combiner as well. You will >> need a custom written "high level" partitioner and combiner that can create >> multiple instances of sub-partitioners/combiners and use the most likely >> one based on their input's characteristics (such as instance type, some >> tag, config., etc.). >> >> >> On Sun, Mar 3, 2013 at 4:07 PM, Vikas Jadhav <[email protected]>wrote: >> >>> >>> >>> >>> >>> Hello >>> >>> 1) I have multiple types of datasets as input to my hadoop job >>> >>> i want write my own inputformat (Exa. MyTableInputformat) >>> and how to specify mapper partitioner combiner per dataset manner >>> I know MultiFileInputFormat class but if i want to asscoite combiner >>> and partitioner class >>> it wont help. it only sets mapper class for per dataset manner. >>> >>> 2) Also i am looking MapTask.java file from source code >>> >>> just want to know where does mapper partitioner and combiner classes are >>> set for particular filesplit >>> while executing job >>> >>> Thank You >>> >>> -- >>> * >>> * >>> * >>> >>> Thanx and Regards* >>> * Vikas Jadhav* >>> >>> >>> >>> -- >>> * >>> * >>> * >>> >>> Thanx and Regards* >>> * Vikas Jadhav* >>> >> >> >> >> -- >> Harsh J >> > > > > -- > * > * > * > > Thanx and Regards* > * Vikas Jadhav* >
