Re: IN_THRESHOLD

2016-11-17 Thread Li Yang
For filter on derived column, it has to translate into a filter on PK.

E.g. say USER_NAME is a derived column (not on cube), USER_ID is its PK (on
cube). When filter USER_NAME='liyang' comes in, it need to translate into
USER_ID in (1,211,382), where ID 1, 211, 382 are three users whose name is
'liyang'.

Now consider 'liyang' is so common a name that there are thousands of
'liyang's. Then the IN clause becomes super long and can cause performance
problem during storage scanning. In such case, the filter can be translated
into a range filter instead, like USER_ID between 1 and 382.

The threshold is used to decided whether the translation to return a IN
condition or a range condition.

Cheers
Yang

On Wed, Nov 16, 2016 at 12:35 AM, Alberto Ramón <a.ramonporto...@gmail.com>
wrote:

> About Kylin 2193
> What is the poupose of 
> org.apache.kylin.storage.translate.DerivedFilterTranslator#
> IN_THRESHOLD ? :)
> (when is used?)
>


IN_THRESHOLD

2016-11-15 Thread Alberto Ramón
About Kylin 2193
What is the poupose of
org.apache.kylin.storage.translate.DerivedFilterTranslator# IN_THRESHOLD ?
:)
(when is used?)