[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2017-11-28 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270272#comment-16270272
 ] 

Shaofeng SHI commented on KYLIN-1869:
-

[~yaho]yanghong, what's the status of it? please update status in time.

> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2016-07-24 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390987#comment-15390987
 ] 

liyang commented on KYLIN-1869:
---

gangbade~~  :-)

> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2016-07-18 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382233#comment-15382233
 ] 

hongbin ma commented on KYLIN-1869:
---

Sorry, https://issues.apache.org/jira/browse/KYLIN-1313 might not be able to 
overcome snapshot's size limit

> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2016-07-18 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382230#comment-15382230
 ] 

hongbin ma commented on KYLIN-1869:
---

will https://issues.apache.org/jira/browse/KYLIN-1313 possibly help? haven't 
tried it for columns on lookup table, but I assume it will work?


> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2016-07-17 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381721#comment-15381721
 ] 

Zhong Yanghong commented on KYLIN-1869:
---

I'd like to do this. :)

> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2016-07-17 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15381713#comment-15381713
 ] 

Zhong Yanghong commented on KYLIN-1869:
---

In some cases, the lookup table is very big. For example, a table for seller 
information. The cardinality of sell_id is more than 10 million. If only 
including seller name as a derived column (64bytes). The size of the lookup 
table will be around 720MB. Currently we put the seller name together with 
seller id into the fact table. In 1.5.2, we can use joint for these two 
columns. Is there any better solution to deal with this kind of high 
cardinality lookup columns?

> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2016-07-14 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378253#comment-15378253
 ] 

liyang commented on KYLIN-1869:
---

I'm neutral from design point of view. Thus lack of motivation to implement 
another version.  :-)

We could attach more metadata to snapshot to describe what columns are 
contained and support both full and tailored snapshots. If someone want to do 
it.

> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2016-07-14 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376931#comment-15376931
 ] 

Shaofeng SHI commented on KYLIN-1869:
-

I agree with Yang; usually lookup table shouldn't be very big; If it does have 
too many columns, you can use a view to shadow.



> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2016-07-14 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376588#comment-15376588
 ] 

Zhong Yanghong commented on KYLIN-1869:
---

After upgrade to 1.5.2, a view with tailored columns for the lookup table can 
be used. However, still think building snapshot only based on model is an easy 
way.

> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2016-07-12 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373460#comment-15373460
 ] 

liyang commented on KYLIN-1869:
---

For too big snapshots, tailor columns has a good reason.

Other easy workarounds are 1) in crease the threshold; 2) copy the lookup table 
or use a view with tailored columns.

> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2016-07-11 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371678#comment-15371678
 ] 

Zhong Yanghong commented on KYLIN-1869:
---

There's a case that the size of the lookup table snapshot is large than the 
current threshold, 300M. However, they only need several columns among around 
30 columns. Is there a better way to deal with this?

> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole

2016-07-10 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370160#comment-15370160
 ] 

Zhong Yanghong commented on KYLIN-1869:
---

Hi [~liyang.g...@gmail.com] and [~Shaofengshi], what do you think of this point 
of view?

> When building snapshot for lookup tables, should we build those dimensions 
> used by model or the whole
> -
>
> Key: KYLIN-1869
> URL: https://issues.apache.org/jira/browse/KYLIN-1869
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently when building a snapshot for a lookup table, the input is the whole 
> value set of the lookup table, which may be not so reasonable. In some cases, 
> a lookup table owns tens columns. However, the columns used by a model or a 
> cube is only a few, 1 to 5. Those unused columns will make the snapshot too 
> large, which will bring burdens for both storing and loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)