We want to insert to hbase on daily basis (hbase 0.90.1 , hadoop append).
currently we have ~ 10 million records per day.We use map/reduce to prepare
data , and write it to hbase using chunks of data (5000 puts  every chunk)
   All process takes 1h 20 minutes. Making some tests verified that writing
to hbase takes ~ 1 hour.

I have couple of questions:
  1) Reducers is writing  data which has a key like : <date>_<some_text> ,
the strange is that   all records were written to a one node.

    Is it correct behaviour? What is the way to get better distributions
accross the cluster? Simply during insertion process  I saw that most load
get that specific node where all data were inserted and all other nodes
almost has no any resources utilisations (cpu , I/O ...).

Oleg.

Reply via email to