Hi All,

I am trying to load data from my OLTP system into HBase. I am using
checkAndPut() to do this.

The reason I am using checkAndPut() and not put() is because the system I am
writing has idempotence requirements i.e. a value will be initially written
with a start state, and then with an end state. Once the value has reached
its end state, no other new values must be accepted for that row. So
basically, I am only interested in having 2 versions of a row, and no other
newer versions will be accepted.

I left the script running for about a night. So far, only 980,000 records
have been written into HBase (over a span of more than 10 hours) and the
script it is still running. Is the performance bad because I am using
checkAndPut() and not a table.put(List<Put>). Is the performance of
checkAndPut() always going to be this slow ?

My script is simple. It load a page of records (about 5000)  from the OLTP
database into memory and executes checkAndPut() on each one.

I do not have table.setAutoFlush() set to false. I can decide to go that
route, but even without buffering/flushing, I do not expect the performance
to be this slow ..

Any pointers/help is appreciated. Since the script is still running, I can
provide logs from the regionserver if needed ..

Thanks,

Sam

Reply via email to