You might get some more traction on user@phoenix since you're not really
asking an HBase specific question here.
Phoenix doesn't have any native capabilities to create/maintain
materialized views for you, but, if your data sets infrequently change,
you could manage that aspect on your own.
On 9/20/19 1:18 PM, Gautham Acharya wrote:
Hi,
Currently I'm using Hbase to store large, sparse matrices of 50,000 columns 10+
million rows of integers.
This matrix is used for fast, random access - we need to be able to fetch
random row/column subsets, as well as entire columns. We also want to very
quickly fetch aggregates (Mean, median, etc) on this matrix.
The data does not change very often for these matrices, so pre-computing is
very feasible here. What I would like to do is maintain a column store (store
the column names as row keys, and a compressed list of all the row values) for
the use case where we select an entire column. Additionally, I would like to
maintain a separate table for each precomputed aggregate (median table, mean
table, etc).
The query time for all these use cases needs to be low latency - under 100ms.
When the data does change for a certain matrix, it would be nice to easily
update the optimized table. Ideally, I would like the column store/aggregation
tables to just be materialized views of the original matrix. It doesn't look
like Apache Phoenix supports materialized views. It looks like Hive does, but
unfortunately Hive doesn't normally offer low latency queries.
Maybe Hive can create the materialized view, and we can just query the
underlying Hbase store for lower latency responses?
What would be a good solution for this?
--gautham