Thank you for the explanation. I have a better understanding now :) On Mon, Jan 10, 2011 at 1:19 PM, Buttler, David <[email protected]> wrote:
> The default way HBase sets up M/R jobs is to make one per partition. So, > if all of your months are in one partition then you will only have one map > job. To do something different you would have to change the way splits are > determined for your job, rather than using the default. > One nice thing about having random keys is that you can use the defaults > and just set a filter for your date range. That way you get maximum > parallelism on the map side. But then you might be constrained by your > subsequent step if you can't parallelize that nicely. > > Dave > > -----Original Message----- > From: Weishung Chung [mailto:[email protected]] > Sent: Monday, January 10, 2011 11:02 AM > To: [email protected] > Subject: Re: customize partitioning of regionserver > > Thanks for the reply. > In my use case, I have to retrieve a range of data usually by month and > operate on them before reinserting them, so it would be nice if i could > partition by month but then I don't know how would the partition affect the > mapreduce job. > > On Mon, Jan 10, 2011 at 12:48 PM, Buttler, David <[email protected]> > wrote: > > > Not to my knowledge. Partitions are dynamically determined. As your > table > > grows, regions become too large and are split roughly in half. This > > prevents unbalanced regions. Any predetermined partitioning will > ultimately > > fail because you don't know your data as well as you think you do. > > > > Dave > > > > > > -----Original Message----- > > From: Weishung Chung [mailto:[email protected]] > > Sent: Monday, January 10, 2011 10:14 AM > > To: [email protected] > > Subject: customize partitioning of regionserver > > > > Does HBase have the capability to partition dataset by range like the > MySQL > > partitioning eg. partition the datetime, row key by month? > > Thank you. > > >
