Re: Materialized views in Hbase/Phoenix

2019-09-27 Thread Pedro Boado
For 2) , how many rows per column are we talking about in this sparse matrix? I mean, how sparse is it? If I understand correctly you are talking about storing the transposed matrix for this usecase. If rows are too large (and they will keep growing with table size) it will end up polluting the

RE: Materialized views in Hbase/Phoenix

2019-09-27 Thread Gautham Acharya
My first email details the use cases: 1. Get values for certain row/column sets – this is where Hbase comes in handy, as we can easily query based on row key and column. No more than 500 rows and 30 columns will be queried. 2. Get an entire column 3. Get Aggregations per

Re: Materialized views in Hbase/Phoenix

2019-09-27 Thread Pedro Boado
Yeah, phoenix won't aggregate billions of rows in under 100ms (probably, nothing will). This sounds more and more like an OLAP use case, doesn't it? Facts table with billions of rows (still, you can handle that volumes with a shared RDBMS) that will never be queried directly.. And precomputed

RE: Materialized views in Hbase/Phoenix

2019-09-27 Thread Gautham Acharya
We are looking at being able to support hundreds of concurrent queries, but not too many more. Will aggregations be performant across these large datasets? (e.g. give me the mean value of each column when all rows are grouped by a certain row property). Precomputing seems much more efficient.

Re: Materialized views in Hbase/Phoenix

2019-09-27 Thread Pedro Boado
Can the aggregation be run on the flight in a phoenix query? 100ms response time but... With how many concurrent queries? On Fri, 27 Sep 2019, 17:23 Gautham Acharya, wrote: > We will be reaching 100million rows early next year, and then billions > shortly after that. So, Hbase will be needed to

RE: Materialized views in Hbase/Phoenix

2019-09-27 Thread Gautham Acharya
We will be reaching 100million rows early next year, and then billions shortly after that. So, Hbase will be needed to scale to that degree. If one of the tables fails to write, we need some kind of a rollback mechanism, which is why I was considering a transaction. We cannot be in a partial

Re: Materialized views in Hbase/Phoenix

2019-09-27 Thread Pedro Boado
For just a few million rows I would go for a RDBMS and not Phoenix / HBase. You don't really need transactions to control completion, just write a flag (a COMPLETED empty file, for instance) as a final step in your job. On Fri, 27 Sep 2019, 15:03 Gautham Acharya, wrote: > Thanks Anil. > > >

RE: Materialized views in Hbase/Phoenix

2019-09-27 Thread Gautham Acharya
Thanks Anil. So, what you’re essentially advocating for is to use some kind of Spark/compute framework (I was going to use AWS Glue) job to write the ‘materialized views’ as separate tables (maybe tied together with some kind of a naming convention?) In this case, we’d end up with some sticky

Re: Materialized views in Hbase/Phoenix

2019-09-27 Thread anil gupta
For your use case, i would suggest to create another table that stores the matrix. Since this data doesnt change that often, maybe you can write a nightly spark/MR job to update/rebuild the matrix table.(If you want near real time that is also possible with any streaming system) Have you looked

RE: Performance degradation on query analysis

2019-09-27 Thread Stepan Migunov
Thanks Josh, you are right, we have actually disabled automatic major compaction. Now we added SYSTEM.STATS to weekly compaction and I hope this resolve the issue. -Original Message- From: Josh Elser [mailto:els...@apache.org] Sent: Tuesday, September 24, 2019 6:39 PM To: