Good point ,
          let me explain the process. We choose  the keys <date>_<somedata>
because after insertion we  run scans and want to analyse data which is
related to the specific date.
Can you provide more details using hashing and how can I scan hbase data per
specific date using it.

Oleg.

On Sun, Mar 20, 2011 at 12:25 AM, Ted Yu <[email protected]> wrote:

> I guess you chose date prefix for query consideration.
> You should introduce hashing so that the row keys are not clustered
> together.
>
> On Sat, Mar 19, 2011 at 3:00 PM, Oleg Ruchovets <[email protected]
> >wrote:
>
> >   We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop
> append).
> > currently we have ~ 10 million records per day.We use map/reduce to
> prepare
> > data , and write it to hbase using chunks of data (5000 puts  every
> chunk)
> >   All process takes 1h 20 minutes. Making some tests verified that
> writing
> > to hbase takes ~ 1 hour.
> >
> > I have couple of questions:
> >  1) Reducers is writing  data which has a key like : <date>_<some_text> ,
> > the strange is that   all records were written to a one node.
> >
> >    Is it correct behaviour? What is the way to get better distributions
> > accross the cluster? Simply during insertion process  I saw that most
> load
> > get that specific node where all data were inserted and all other nodes
> > almost has no any resources utilisations (cpu , I/O ...).
> >
> > Oleg.
> >
>

Reply via email to