Whats the behavior then if you return hash % num_reducers and you have no splits defined. When the reducer writes to the table does the region server local to the reducer create a new region ?
Graeme On Wed, Apr 10, 2013 at 1:26 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > So. > > I looked at the code, and I have one comment/suggestion here. > > If the table we are outputing to has regions, then partitions are build > around that, and that's fine. But if the table is totally empty with a > single region, even if we setNumReduceTasks to 2 or more, all the keys will > go on the same first reducer because of this: > if (this.startKeys.length == 1){ > return 0; > } > I think it will be better to return something like keycrc%numPartitions > instead. That still allow the application to spread the reducing process > over multinode(racks) even if there is only one region in the table. > > In my usecase, I have millions of lines producing some statistics. At the > end, I will have only about 600 lines, but it will take a lot of map and > reduce time to go from millions to 600, that's why I'm looking to have more > than one reducer. However, with only 600 lines, it's very difficult to > pre-split the table. Keys are all very close. > > Does anyone see anything wrong with changing this default behaviour when > startKeys.length == 1? If not, I will open a JIRA and upload a patch. > > JM > > 2013/4/10 Jean-Marc Spaggiari <jean-m...@spaggiari.org> > > > Thanks Ted. > > > > It's exactly where I was looking at now. I was close. I will take a > deeper > > look. > > > > Thanks Nitin for the link. I will read that too. > > > > JM > > > > 2013/4/10 Nitin Pawar <nitinpawar...@gmail.com> > > > >> To add what Ted said, > >> > >> the same discussion happened on the question Jean asked > >> > >> https://issues.apache.org/jira/browse/HBASE-1287 > >> > >> > >> On Wed, Apr 10, 2013 at 7:28 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> > >> > Jean-Marc: > >> > Take a look at HRegionPartitioner which is in both mapred and > mapreduce > >> > packages: > >> > > >> > * This is used to partition the output keys into groups of keys. > >> > > >> > * Keys are grouped according to the regions that currently exist > >> > > >> > * so that each reducer fills a single region so load is distributed. > >> > > >> > Cheers > >> > > >> > On Wed, Apr 10, 2013 at 6:54 AM, Jean-Marc Spaggiari < > >> > jean-m...@spaggiari.org> wrote: > >> > > >> > > Hi Nitin, > >> > > > >> > > You got my question correctly. > >> > > > >> > > However, I'm wondering how it's working when it's done into HBase. > Do > >> > > we have defaults partionners so we have the same garantee that > records > >> > > mapping to one key go to the same reducer. Or do we have to > implement > >> > > this one our own. > >> > > > >> > > JM > >> > > > >> > > 2013/4/10 Nitin Pawar <nitinpawar...@gmail.com>: > >> > > > I hope i understood what you are asking is this . If not then > >> pardon me > >> > > :) > >> > > > from the hadoop developer handbook few lines > >> > > > > >> > > > The*Partitioner* class determines which partition a given (key, > >> value) > >> > > pair > >> > > > will go to. The default partitioner computes a hash value for the > >> key > >> > and > >> > > > assigns the partition based on this result. It garantees that all > >> the > >> > > > records mapping to one key go to same reducer > >> > > > > >> > > > You can write your custom partitioner as well > >> > > > here is the link : > >> > > > > >> http://developer.yahoo.com/hadoop/tutorial/module5.html#partitioning > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > On Wed, Apr 10, 2013 at 6:19 PM, Jean-Marc Spaggiari < > >> > > > jean-m...@spaggiari.org> wrote: > >> > > > > >> > > >> Hi, > >> > > >> > >> > > >> quick question. How are the data from the map tasks partitionned > >> for > >> > > >> the reducers? > >> > > >> > >> > > >> If there is 1 reducer, it's easy, but if there is more, are all > >> they > >> > > >> same keys garanteed to end on the same reducer? Or not necessary? > >> If > >> > > >> they are not, how can we provide a partionning function? > >> > > >> > >> > > >> Thanks, > >> > > >> > >> > > >> JM > >> > > >> > >> > > > > >> > > > > >> > > > > >> > > > -- > >> > > > Nitin Pawar > >> > > > >> > > >> > >> > >> > >> -- > >> Nitin Pawar > >> > > > > > -- Graeme Wallace CTO FareCompare.com O: 972 588 1414 M: 214 681 9018