Hi Gary, Would you mind posting your schema and query as well? Thanks, James
On Mon, Feb 23, 2015 at 5:08 PM, Gary Schulte <[email protected]> wrote: > I am currently evaluating Phoenix 4.2.2 for an analytic workload. Several > of the test cases are simple aggregation over varying cardinality sets. > For example, for one of the test aggregation queries I am seeing response > times along the lines of: > > fact query cardinality response time using primary index 1k 0.3 3k 0.3 > 5k 0.4 10k 0.4 15k 0.7 20k 0.7 50k 2.2 100k 2.7 200k 5.3 400k 13.8 > 800k 20.1 1.5 mil 48.8 > where the plan looks like: > +------------------------------------------+ > | PLAN | > +------------------------------------------+ > | CLIENT 81-CHUNK PARALLEL 10-WAY SKIP SCAN ON 10 RANGES OVER > PERF.BIG_OLAP_DOC [0,10724,3600,1] - [9,10724,3685,1] | > | SERVER FILTER BY (NETWORK = 'SEARCH' AND true AND CUSTOMER_ID = 545 > AND KEYWORD_ID IN > (613213594,613214433,613216496,613216861,613217357,613218757,613218770,613218833,613218871,613218937,613218938,613220926,613239657,61 > | > | SERVER AGGREGATE INTO SINGLE ROW | > +------------------------------------------+ > > > The performance is great for low cardinality sets, but as the cardinality > of the set to be aggregated grows, it appears the response time scales > essentially linearly with the size of the set being queried/aggregated. > This is a roughly 100 million row 10-bucket salted table on a 10-node > cluster > > I had presumed the value lookup might be the major cost factor, but if I > remove all columns from the projection and criteria that are not part of > the key, I only see about 20% reduced response time. > > +------------------------------------------+ > | PLAN | > +------------------------------------------+ > | CLIENT 101-CHUNK PARALLEL 10-WAY SKIP SCAN ON 10 RANGES OVER > PERF.BIG_OLAP_DOC [0,10724,3000,1] - [9,10724,3685,1] | > | SERVER FILTER BY FIRST KEY ONLY AND (true AND KEYWORD_ID IN > (613213594,613214433,613216496,613216861,613217357,613218757,613218770,613218833,613218871,613218937,613218938,613220926,613239657,613239663,613239668,61323969 > | > | SERVER AGGREGATE INTO SINGLE ROW | > +------------------------------------------+ > > So it seems like the parallel index skip scan is the major factor. Is > this expected? > > Insight about the expected performance and behavior of high cardinality > queries and tips about optimizing for them would be greatly appreciated. > > > TIA >
