Re: Design schema for faster analytics

2016-10-18 Thread Buntu Dev
Thanks Jim. While performance tuning of the queries will definitely help, I
also would like to know if there is a general practice on how metrics with
multiple dimensions are generally queried. For example, for a given metric
like LTV I got about 65 different ways to segment based on product/sku and
time interval combinations to view the metric.

Thanks!

On Tue, Oct 18, 2016 at 9:37 AM, Jim Apple  wrote:

> This might help:
> http://www.cloudera.com/documentation/enterprise/latest/topics/impala_
> performance.html
>
> On Tue, Oct 18, 2016 at 12:30 AM, Buntu Dev  wrote:
> > I got table of user purchases and subscriptions with various product skus
> > along with user attributes in a single table (~1g and 20M rows).
> >
> > Due to the number of combinations for slicing and dicing the data, it
> takes
> > a while to query for churn, retention, etc. on the dataset for various
> time
> > periods and product skus selected and makes it not ideal the frontend.
> > Generating a precomputed table with all the combinations is pretty
> > exhausting, so I'm look to see if there are any best practices in
> designing
> > a schema to overcome these issues.
> >
> >
> > Thanks!
>


Re: Design schema for faster analytics

2016-10-18 Thread Jim Apple
This might help:
http://www.cloudera.com/documentation/enterprise/latest/topics/impala_performance.html

On Tue, Oct 18, 2016 at 12:30 AM, Buntu Dev  wrote:
> I got table of user purchases and subscriptions with various product skus
> along with user attributes in a single table (~1g and 20M rows).
>
> Due to the number of combinations for slicing and dicing the data, it takes
> a while to query for churn, retention, etc. on the dataset for various time
> periods and product skus selected and makes it not ideal the frontend.
> Generating a precomputed table with all the combinations is pretty
> exhausting, so I'm look to see if there are any best practices in designing
> a schema to overcome these issues.
>
>
> Thanks!


Design schema for faster analytics

2016-10-18 Thread Buntu Dev
I got table of user purchases and subscriptions with various product skus
along with user attributes in a single table (~1g and 20M rows).

Due to the number of combinations for slicing and dicing the data, it takes
a while to query for churn, retention, etc. on the dataset for various time
periods and product skus selected and makes it not ideal the frontend.
Generating a precomputed table with all the combinations is pretty
exhausting, so I'm look to see if there are any best practices in designing
a schema to overcome these issues.


Thanks!