Re: Design schema for faster analytics
Thanks Jim. While performance tuning of the queries will definitely help, I also would like to know if there is a general practice on how metrics with multiple dimensions are generally queried. For example, for a given metric like LTV I got about 65 different ways to segment based on product/sku and time interval combinations to view the metric. Thanks! On Tue, Oct 18, 2016 at 9:37 AM, Jim Apple wrote: > This might help: > http://www.cloudera.com/documentation/enterprise/latest/topics/impala_ > performance.html > > On Tue, Oct 18, 2016 at 12:30 AM, Buntu Dev wrote: > > I got table of user purchases and subscriptions with various product skus > > along with user attributes in a single table (~1g and 20M rows). > > > > Due to the number of combinations for slicing and dicing the data, it > takes > > a while to query for churn, retention, etc. on the dataset for various > time > > periods and product skus selected and makes it not ideal the frontend. > > Generating a precomputed table with all the combinations is pretty > > exhausting, so I'm look to see if there are any best practices in > designing > > a schema to overcome these issues. > > > > > > Thanks! >
Re: Design schema for faster analytics
This might help: http://www.cloudera.com/documentation/enterprise/latest/topics/impala_performance.html On Tue, Oct 18, 2016 at 12:30 AM, Buntu Dev wrote: > I got table of user purchases and subscriptions with various product skus > along with user attributes in a single table (~1g and 20M rows). > > Due to the number of combinations for slicing and dicing the data, it takes > a while to query for churn, retention, etc. on the dataset for various time > periods and product skus selected and makes it not ideal the frontend. > Generating a precomputed table with all the combinations is pretty > exhausting, so I'm look to see if there are any best practices in designing > a schema to overcome these issues. > > > Thanks!
Design schema for faster analytics
I got table of user purchases and subscriptions with various product skus along with user attributes in a single table (~1g and 20M rows). Due to the number of combinations for slicing and dicing the data, it takes a while to query for churn, retention, etc. on the dataset for various time periods and product skus selected and makes it not ideal the frontend. Generating a precomputed table with all the combinations is pretty exhausting, so I'm look to see if there are any best practices in designing a schema to overcome these issues. Thanks!