Re: hbase insertion optimisation:

Ted Yu Sat, 19 Mar 2011 15:56:15 -0700

Timestamp is in every key value pair.
Take a look at this method in Scan:
  public Scan setTimeRange(long minStamp, long maxStamp)


Cheers

On Sat, Mar 19, 2011 at 3:43 PM, Oleg Ruchovets <[email protected]>wrote:

> Good point ,
>          let me explain the process. We choose  the keys <date>_<somedata>
> because after insertion we  run scans and want to analyse data which is
> related to the specific date.
> Can you provide more details using hashing and how can I scan hbase data
> per
> specific date using it.
>
> Oleg.
>
> On Sun, Mar 20, 2011 at 12:25 AM, Ted Yu <[email protected]> wrote:
>
> > I guess you chose date prefix for query consideration.
> > You should introduce hashing so that the row keys are not clustered
> > together.
> >
> > On Sat, Mar 19, 2011 at 3:00 PM, Oleg Ruchovets <[email protected]
> > >wrote:
> >
> > >   We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop
> > append).
> > > currently we have ~ 10 million records per day.We use map/reduce to
> > prepare
> > > data , and write it to hbase using chunks of data (5000 puts  every
> > chunk)
> > >   All process takes 1h 20 minutes. Making some tests verified that
> > writing
> > > to hbase takes ~ 1 hour.
> > >
> > > I have couple of questions:
> > >  1) Reducers is writing  data which has a key like : <date>_<some_text>
> ,
> > > the strange is that   all records were written to a one node.
> > >
> > >    Is it correct behaviour? What is the way to get better distributions
> > > accross the cluster? Simply during insertion process  I saw that most
> > load
> > > get that specific node where all data were inserted and all other nodes
> > > almost has no any resources utilisations (cpu , I/O ...).
> > >
> > > Oleg.
> > >
> >
>

Re: hbase insertion optimisation:

Reply via email to