Timestamp is in every key value pair. Take a look at this method in Scan: public Scan setTimeRange(long minStamp, long maxStamp)
Cheers On Sat, Mar 19, 2011 at 3:43 PM, Oleg Ruchovets <[email protected]>wrote: > Good point , > let me explain the process. We choose the keys <date>_<somedata> > because after insertion we run scans and want to analyse data which is > related to the specific date. > Can you provide more details using hashing and how can I scan hbase data > per > specific date using it. > > Oleg. > > On Sun, Mar 20, 2011 at 12:25 AM, Ted Yu <[email protected]> wrote: > > > I guess you chose date prefix for query consideration. > > You should introduce hashing so that the row keys are not clustered > > together. > > > > On Sat, Mar 19, 2011 at 3:00 PM, Oleg Ruchovets <[email protected] > > >wrote: > > > > > We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop > > append). > > > currently we have ~ 10 million records per day.We use map/reduce to > > prepare > > > data , and write it to hbase using chunks of data (5000 puts every > > chunk) > > > All process takes 1h 20 minutes. Making some tests verified that > > writing > > > to hbase takes ~ 1 hour. > > > > > > I have couple of questions: > > > 1) Reducers is writing data which has a key like : <date>_<some_text> > , > > > the strange is that all records were written to a one node. > > > > > > Is it correct behaviour? What is the way to get better distributions > > > accross the cluster? Simply during insertion process I saw that most > > load > > > get that specific node where all data were inserted and all other nodes > > > almost has no any resources utilisations (cpu , I/O ...). > > > > > > Oleg. > > > > > >
