Can you describe the relationship between when using a dimension in the
WHERE clause vs GROUP BY and how to use each in AGG?

e.g. in the example
http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/

buyer_id is a high cardinality dim.  2 agg groups are created...one with
all dims including buyer_id (where clause) and one has all dims without
buyer_id.

This is still confusing to me when expanding to a generalized use case.
 e.g. say i have dims (A, B, C, D, E, F, G).   half of them i need to be
able to do a group by and the other half simply filters.  How do these
relate cardinality by usage (group by vs where)?

Thanks


On Thu, Jun 29, 2017 at 3:28 AM, Li Yang <[email protected]> wrote:

> The approach sounds good to me and makes sense.
>
> > The cube build time is taking forever.
>
> Well, that depends more on your Hadoop env I guess. 6 dimensions are small
> cubes indeed.
>
> On Thu, Jun 22, 2017 at 10:49 PM, Sonny Heer <[email protected]> wrote:
>
>> Hi users,
>>
>> I need some clarification on how to properly use aggregation groups.
>>
>> Assume I have report page 1 which has filters A, B, C, D.  When user is
>> in page 2, these filters are passed along to (drilldown).  Page 2 has other
>> filterable fields (1,2,3), but each is independently connected only to
>> previous filtered options.  e.g.page2 fields won't need to be combined with
>> another field in page 2.  ABCD with 1 but not ABCD 1 & 2.
>>
>> So what I did is created an aggregation group per field in page 2.  idea
>> was so it wouldn't do a 2^n on ABCD123  but ABCD1, ABCD2, etc.  I'm not
>> sure if this is correct way to handle.  The cube build time is taking
>> forever.  Please advise...
>>
>>
>>
>>
>> --
>>
>>
>>
>


-- 


Sonny S. Heer
Senior Software Engineer
m: 360-434-4354 h: 509-884-2574

Reply via email to