[jira] [Commented] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-06-28 Thread XiaoXiang Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875378#comment-16875378
 ] 

XiaoXiang Yu commented on KYLIN-3628:
-

Thank you for reporting, [~seva_ostapenko], you have provided a very detailed 
analysis. I will check that.

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-06-28 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 7:32 PM:


This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
 A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

To address both the original issue (where lookup table snapshoted in multiple 
cubes and suitable cube is picked without looking at the segment build times) 
and the regression introduced by the change, CubeManager.findLatestSnapshot 
needs to check if lookup table is actually snapshotted as part of the cube 
realization. So, if there are mix of multiple cubes that do capture lookup 
table and ones that don't only the ones that do capture lookup table are ranked 
by build time.

Affected file is CubeManager.java. The bug is in this check 
{code:java}
if (realization.getModel().isLookupTable(lookupTableName)) {
{code}
getModel.isLookupTable() operates on the model level and across all the cubes, 
while the check needs to be scoped to the current cube only.

 


was (Author: seva_ostapenko):
This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
 A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

To address both the original issue (where lookup table snapshoted in multiple 
cubes and suitable cube is picked without looking at the segment build times) 
and the regression introduced by the change, CubeManager.findLatestSnapshot 
needs to check if lookup table is actually snapshotted as part of the cube 
realization. So, if there are mix of multiple cubes that do capture lookup 
table and ones that don't only the ones that do capture lookup table are ranked 
by build time.

Affected file is CubeManager.java. The bug is in this check 
{code:java}
if (realization.getModel().isLookupTable(lookupTableName)) {
{code}
getModel operates across all cubes, which the check needs to be scoped to the 
current cube only.

 

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-06-28 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 7:23 PM:


This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
 A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

To address both the original issue (where lookup table snapshoted in multiple 
cubes and suitable cube is picked without looking at the segment build times) 
and the regression introduced by the change, CubeManager.findLatestSnapshot 
needs to check if lookup table is actually snapshotted as part of the cube 
realization. So, if there are mix of multiple cubes that do capture lookup 
table and ones that don't only the ones that do capture lookup table are ranked 
by build time.

Affected file is CubeManager.java. The bug is in this check 
{code:java}
if (realization.getModel().isLookupTable(lookupTableName)) {
{code}
getModel operates across all cubes, which the check needs to be scoped to the 
current cube only.

 


was (Author: seva_ostapenko):
This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
 A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-06-28 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 3:32 PM:


This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
 A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}


was (Author: seva_ostapenko):
This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-06-28 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 3:32 PM:


This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}


was (Author: seva_ostapenko):
This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
Query select * from L1 will fail with error that C2 does not contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-06-28 Thread Vsevolod Ostapenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko reopened KYLIN-3628:
---

This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
Query select * from L1 will fail with error that C2 does not contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)