Re: hbase insertion optimisation:

Oleg Ruchovets Sun, 20 Mar 2011 08:22:22 -0700

I took org.apache.hadoop.hbase.util.MurmurHash class  and want to use it for
my hashing.
     Till now I had  key , value pairs (key format <date>_<somedata>) ,
      Using MurmurHash I get hashing for my key.
My questions is :
   1) What is the way to use hashing. Meaning how code should  be written
 so that instead of writing key, value it will use hashing too?
   2)Can I different hash function be used  for different Hbase tables?
 What is the way to do it?


Thanks in advance
Oleg.



On Sun, Mar 20, 2011 at 12:25 AM, Ted Yu <[email protected]> wrote:

> I guess you chose date prefix for query consideration.
> You should introduce hashing so that the row keys are not clustered
> together.
>
> On Sat, Mar 19, 2011 at 3:00 PM, Oleg Ruchovets <[email protected]
> >wrote:
>
> >   We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop
> append).
> > currently we have ~ 10 million records per day.We use map/reduce to
> prepare
> > data , and write it to hbase using chunks of data (5000 puts  every
> chunk)
> >   All process takes 1h 20 minutes. Making some tests verified that
> writing
> > to hbase takes ~ 1 hour.
> >
> > I have couple of questions:
> >  1) Reducers is writing  data which has a key like : <date>_<some_text> ,
> > the strange is that   all records were written to a one node.
> >
> >    Is it correct behaviour? What is the way to get better distributions
> > accross the cluster? Simply during insertion process  I saw that most
> load
> > get that specific node where all data were inserted and all other nodes
> > almost has no any resources utilisations (cpu , I/O ...).
> >
> > Oleg.
> >
>

Re: hbase insertion optimisation:

Reply via email to