Hi All, I am trying to load data from my OLTP system into HBase. I am using checkAndPut() to do this.
The reason I am using checkAndPut() and not put() is because the system I am writing has idempotence requirements i.e. a value will be initially written with a start state, and then with an end state. Once the value has reached its end state, no other new values must be accepted for that row. So basically, I am only interested in having 2 versions of a row, and no other newer versions will be accepted. I left the script running for about a night. So far, only 980,000 records have been written into HBase (over a span of more than 10 hours) and the script it is still running. Is the performance bad because I am using checkAndPut() and not a table.put(List<Put>). Is the performance of checkAndPut() always going to be this slow ? My script is simple. It load a page of records (about 5000) from the OLTP database into memory and executes checkAndPut() on each one. I do not have table.setAutoFlush() set to false. I can decide to go that route, but even without buffering/flushing, I do not expect the performance to be this slow .. Any pointers/help is appreciated. Since the script is still running, I can provide logs from the regionserver if needed .. Thanks, Sam
