Thanks Jim. While performance tuning of the queries will definitely help, I also would like to know if there is a general practice on how metrics with multiple dimensions are generally queried. For example, for a given metric like LTV I got about 65 different ways to segment based on product/sku and time interval combinations to view the metric.
Thanks! On Tue, Oct 18, 2016 at 9:37 AM, Jim Apple <jbap...@cloudera.com> wrote: > This might help: > http://www.cloudera.com/documentation/enterprise/latest/topics/impala_ > performance.html > > On Tue, Oct 18, 2016 at 12:30 AM, Buntu Dev <buntu...@gmail.com> wrote: > > I got table of user purchases and subscriptions with various product skus > > along with user attributes in a single table (~1g and 20M rows). > > > > Due to the number of combinations for slicing and dicing the data, it > takes > > a while to query for churn, retention, etc. on the dataset for various > time > > periods and product skus selected and makes it not ideal the frontend. > > Generating a precomputed table with all the combinations is pretty > > exhausting, so I'm look to see if there are any best practices in > designing > > a schema to overcome these issues. > > > > > > Thanks! >