I took org.apache.hadoop.hbase.util.MurmurHash class and want to use it for
my hashing.
Till now I had key , value pairs (key format <date>_<somedata>) ,
Using MurmurHash I get hashing for my key.
My questions is :
1) What is the way to use hashing. Meaning how code should be written
so that instead of writing key, value it will use hashing too?
2)Can I different hash function be used for different Hbase tables?
What is the way to do it?
Thanks in advance
Oleg.
On Sun, Mar 20, 2011 at 12:25 AM, Ted Yu <[email protected]> wrote:
> I guess you chose date prefix for query consideration.
> You should introduce hashing so that the row keys are not clustered
> together.
>
> On Sat, Mar 19, 2011 at 3:00 PM, Oleg Ruchovets <[email protected]
> >wrote:
>
> > We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop
> append).
> > currently we have ~ 10 million records per day.We use map/reduce to
> prepare
> > data , and write it to hbase using chunks of data (5000 puts every
> chunk)
> > All process takes 1h 20 minutes. Making some tests verified that
> writing
> > to hbase takes ~ 1 hour.
> >
> > I have couple of questions:
> > 1) Reducers is writing data which has a key like : <date>_<some_text> ,
> > the strange is that all records were written to a one node.
> >
> > Is it correct behaviour? What is the way to get better distributions
> > accross the cluster? Simply during insertion process I saw that most
> load
> > get that specific node where all data were inserted and all other nodes
> > almost has no any resources utilisations (cpu , I/O ...).
> >
> > Oleg.
> >
>