Thank you for the explanation. I have a better understanding now :)

On Mon, Jan 10, 2011 at 1:19 PM, Buttler, David <[email protected]> wrote:

> The default way HBase sets up M/R jobs is to make one per partition.  So,
> if all of your months are in one partition then you will only have one map
> job.  To do something different you would have to change the way splits are
> determined for your job, rather than using the default.
> One nice thing about having random keys is that you can use the defaults
> and just set a filter for your date range. That way you get maximum
> parallelism on the map side.  But then you might be constrained by your
> subsequent step if you can't parallelize that nicely.
>
> Dave
>
> -----Original Message-----
> From: Weishung Chung [mailto:[email protected]]
> Sent: Monday, January 10, 2011 11:02 AM
> To: [email protected]
> Subject: Re: customize partitioning of regionserver
>
> Thanks for the reply.
> In my use case, I have to retrieve a range of data usually by month and
> operate on them before reinserting them, so it would be nice if i could
> partition by month but then I don't know how would the partition affect the
> mapreduce job.
>
> On Mon, Jan 10, 2011 at 12:48 PM, Buttler, David <[email protected]>
> wrote:
>
> > Not to my knowledge.  Partitions are dynamically determined. As your
> table
> > grows, regions become too large and are split roughly in half.  This
> > prevents unbalanced regions.  Any predetermined partitioning will
> ultimately
> > fail because you don't know your data as well as you think you do.
> >
> > Dave
> >
> >
> > -----Original Message-----
> > From: Weishung Chung [mailto:[email protected]]
> > Sent: Monday, January 10, 2011 10:14 AM
> > To: [email protected]
> > Subject: customize partitioning of regionserver
> >
> > Does HBase have the capability to partition dataset by range like the
> MySQL
> > partitioning eg. partition the datetime, row key by month?
> > Thank you.
> >
>

Reply via email to