Hi,
We have some problems when performing a large amounts of deletes.

We are using ASF HBase 0.90 with cloudera distribution for HDFS(cdh3b3)). We store in HBase the inverted index of some documents. We get a constant throughput in inserting documents and in scanning the table, but we have problems when some deletes are performed.

We did some tests (10 000) with the following operations:
- scan some rows (a few: 2-3 rows)
- delete some columns for the previous scanned rows(one column per test deleted)
- add some columns for the previous scanned rows

We got a huge degradation in performance in both scan and deletes.
- scan took from 5 to 15 milliseconds (for first 10 000)
- delete took from 4 to 13 milliseconds
and it continued to decrease. After 30000 operations a scan took 45 milliseconds and a delete 16 milliseconds.

The put operation was (almost) constant in time (3.99 ms to 4.5ms after 30000 tests).

On long time tests, we had the same performance degradation, but it seems that at some point the performance is up again. It might be a major compaction, a disk flush or what?

In which way deletes affect the scan operation? How can we minimize these effects? Do you know what operation will put the database in the optimal state?

Thank you,
Iulia



--
Iulia Zidaru
Java Developer

1&1 Internet AG - Bucharest/Romania - Web Components Romania
18 Mircea Eliade St
Sect 1, Bucharest
RO Bucharest, 012015
[email protected]
0040 31 223 9153

Reply via email to