[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16270272#comment-16270272 ] Shaofeng SHI commented on KYLIN-1869: - [~yaho]yanghong, what's the status of it? please update status in time. > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390987#comment-15390987 ] liyang commented on KYLIN-1869: --- gangbade~~ :-) > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382233#comment-15382233 ] hongbin ma commented on KYLIN-1869: --- Sorry, https://issues.apache.org/jira/browse/KYLIN-1313 might not be able to overcome snapshot's size limit > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382230#comment-15382230 ] hongbin ma commented on KYLIN-1869: --- will https://issues.apache.org/jira/browse/KYLIN-1313 possibly help? haven't tried it for columns on lookup table, but I assume it will work? > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381721#comment-15381721 ] Zhong Yanghong commented on KYLIN-1869: --- I'd like to do this. :) > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381713#comment-15381713 ] Zhong Yanghong commented on KYLIN-1869: --- In some cases, the lookup table is very big. For example, a table for seller information. The cardinality of sell_id is more than 10 million. If only including seller name as a derived column (64bytes). The size of the lookup table will be around 720MB. Currently we put the seller name together with seller id into the fact table. In 1.5.2, we can use joint for these two columns. Is there any better solution to deal with this kind of high cardinality lookup columns? > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378253#comment-15378253 ] liyang commented on KYLIN-1869: --- I'm neutral from design point of view. Thus lack of motivation to implement another version. :-) We could attach more metadata to snapshot to describe what columns are contained and support both full and tailored snapshots. If someone want to do it. > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376931#comment-15376931 ] Shaofeng SHI commented on KYLIN-1869: - I agree with Yang; usually lookup table shouldn't be very big; If it does have too many columns, you can use a view to shadow. > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376588#comment-15376588 ] Zhong Yanghong commented on KYLIN-1869: --- After upgrade to 1.5.2, a view with tailored columns for the lookup table can be used. However, still think building snapshot only based on model is an easy way. > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373460#comment-15373460 ] liyang commented on KYLIN-1869: --- For too big snapshots, tailor columns has a good reason. Other easy workarounds are 1) in crease the threshold; 2) copy the lookup table or use a view with tailored columns. > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371678#comment-15371678 ] Zhong Yanghong commented on KYLIN-1869: --- There's a case that the size of the lookup table snapshot is large than the current threshold, 300M. However, they only need several columns among around 30 columns. Is there a better way to deal with this? > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371233#comment-15371233 ] liyang commented on KYLIN-1869: --- The current design -- take snapshot of all columns -- allows a snapshot be shared globally, cross cubes and models. Tailor columns will reduce the snapshot size for one cube, at cost that snapshots become less general, and we may have to build multiple snapshots for one lookup table. There are pros and cons on both side. > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1869) When building snapshot for lookup tables, should we build those dimensions used by model or the whole
[ https://issues.apache.org/jira/browse/KYLIN-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370160#comment-15370160 ] Zhong Yanghong commented on KYLIN-1869: --- Hi [~liyang.g...@gmail.com] and [~Shaofengshi], what do you think of this point of view? > When building snapshot for lookup tables, should we build those dimensions > used by model or the whole > - > > Key: KYLIN-1869 > URL: https://issues.apache.org/jira/browse/KYLIN-1869 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > > Currently when building a snapshot for a lookup table, the input is the whole > value set of the lookup table, which may be not so reasonable. In some cases, > a lookup table owns tens columns. However, the columns used by a model or a > cube is only a few, 1 to 5. Those unused columns will make the snapshot too > large, which will bring burdens for both storing and loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)