Hi all,
I had various issues with big tables while experimenting the couple last
weeks.
The thing that goes to my mind is that hbase (+phoenix) works only when
there is a fairly powerful cluster and say 1/2 the data can fit into the
combined servers memory and disks are fast (SSD?) as well. It doesn't
seem to be able to work when tables are 2x as large as the memory
allocated to region servers (frankly I think it is less)
Things that constantly fail:
- non-trivial queries on large tables (with group by, counts, joins)
with region server out of memory errors or crashes without any reason
for Xmx of 4G or 8G
- index creation on the same big tables. Those always fail I think
around the point when hbase has to flush it's memory regions to the disk
and couldn't find a solution
- spark jobs fail unless they are throttled to feed hbase with the data
it can take . No backpressure?
There were no replies to my emails regarding the issues, which makes me
think there aren't solutions (or solutions are pretty hard to find and
not many ppl know them).
So after 21 tweaks to the default config, I am still not able to operate
it as a normal database.
Should I start believing my config is all wrong or that hbase+phoenix is
only working if there is a sufficiently powerful cluster to handle the data?
I believe it is a great project and the functionality is really useful.
What's lacking is 3 sample configs for 3 different strength clusters.
Thanks