A 'table split' is a region split and as you split regions, balance the regions, you should see some parallelism in your M/R jobs.
Of course depending on your choice of row keys... YMMV. HTH -Mike On Aug 26, 2013, at 2:16 AM, Pavan Sudheendra <[email protected]> wrote: > Hi all, > > How to make use of a TableSplit or a Region Split? How is it used in > TableInputFormatBase# > getSplits() ? > > > I have 6 Region Servers across the cluster for the map-reduce task which i > am using, How to leverage this so that the table is split across the > clusters and the map-reduce application finishes fast.. Right now, it is > very slow.. For aggregating 3 table values, 1 with 100,000 rows and other > two tables i'm only using get operating to get the value by passing the > key.. For this setup, it takes 40-50 mins.. Which is worse.. The first > table would eventually be around 20-25m rows.. Please lead me in the right > way.. I will paste the code if anybody is interested. > > > -- > Regards- > Pavan The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
