Hi,
We have some problems when performing a large amounts of deletes.
We are using ASF HBase 0.90 with cloudera distribution for
HDFS(cdh3b3)). We store in HBase the inverted index of some documents.
We get a constant throughput in inserting documents and in scanning the
table, but we have problems when some deletes are performed.
We did some tests (10 000) with the following operations:
- scan some rows (a few: 2-3 rows)
- delete some columns for the previous scanned rows(one column per test
deleted)
- add some columns for the previous scanned rows
We got a huge degradation in performance in both scan and deletes.
- scan took from 5 to 15 milliseconds (for first 10 000)
- delete took from 4 to 13 milliseconds
and it continued to decrease. After 30000 operations a scan took 45
milliseconds and a delete 16 milliseconds.
The put operation was (almost) constant in time (3.99 ms to 4.5ms after
30000 tests).
On long time tests, we had the same performance degradation, but it
seems that at some point the performance is up again. It might be a
major compaction, a disk flush or what?
In which way deletes affect the scan operation? How can we minimize
these effects? Do you know what operation will put the database in the
optimal state?
Thank you,
Iulia
--
Iulia Zidaru
Java Developer
1&1 Internet AG - Bucharest/Romania - Web Components Romania
18 Mircea Eliade St
Sect 1, Bucharest
RO Bucharest, 012015
[email protected]
0040 31 223 9153