[jira] [Commented] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875378#comment-16875378 ] XiaoXiang Yu commented on KYLIN-3628: - Thank you for reporting, [~seva_ostapenko], you have provided a very detailed analysis. I will check that. > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997 ] Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 7:32 PM: This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. To address both the original issue (where lookup table snapshoted in multiple cubes and suitable cube is picked without looking at the segment build times) and the regression introduced by the change, CubeManager.findLatestSnapshot needs to check if lookup table is actually snapshotted as part of the cube realization. So, if there are mix of multiple cubes that do capture lookup table and ones that don't only the ones that do capture lookup table are ranked by build time. Affected file is CubeManager.java. The bug is in this check {code:java} if (realization.getModel().isLookupTable(lookupTableName)) { {code} getModel.isLookupTable() operates on the model level and across all the cubes, while the check needs to be scoped to the current cube only. was (Author: seva_ostapenko): This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. To address both the original issue (where lookup table snapshoted in multiple cubes and suitable cube is picked without looking at the segment build times) and the regression introduced by the change, CubeManager.findLatestSnapshot needs to check if lookup table is actually snapshotted as part of the cube realization. So, if there are mix of multiple cubes that do capture lookup table and ones that don't only the ones that do capture lookup table are ranked by build time. Affected file is CubeManager.java. The bug is in this check {code:java} if (realization.getModel().isLookupTable(lookupTableName)) { {code} getModel operates across all cubes, which the check needs to be scoped to the current cube only. > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997 ] Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 7:23 PM: This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. To address both the original issue (where lookup table snapshoted in multiple cubes and suitable cube is picked without looking at the segment build times) and the regression introduced by the change, CubeManager.findLatestSnapshot needs to check if lookup table is actually snapshotted as part of the cube realization. So, if there are mix of multiple cubes that do capture lookup table and ones that don't only the ones that do capture lookup table are ranked by build time. Affected file is CubeManager.java. The bug is in this check {code:java} if (realization.getModel().isLookupTable(lookupTableName)) { {code} getModel operates across all cubes, which the check needs to be scoped to the current cube only. was (Author: seva_ostapenko): This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997 ] Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 3:32 PM: This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} was (Author: seva_ostapenko): This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997 ] Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 3:32 PM: This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. A query "select * from L1" will fail with an error stating that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} was (Author: seva_ostapenko): This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. Query select * from L1 will fail with error that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (KYLIN-3628) Query with lookup table always use latest snapshot
[ https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vsevolod Ostapenko reopened KYLIN-3628: --- This code change introduces a nasty bug, where Kylin will pick a random cube to answer the query that goes against a lookup table. For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and C2 does not. C2 has more recent segments built than C1. Query select * from L1 will fail with error that C2 does not contain L1. Code analysis indicates, that LookupTableEnumerator overwrites prior cube choice correctly made by RealizationChooser. The bug is that LookupTableEnumerator finds the latest snapshot on all the realizations of all the cubes in the model, not the one that was already correctly chosen. That leads to a random behavior and unexpected failures. Affected code is in LookupTableEnumerator.java. {code:java} if (olapContext.realization instanceof CubeInstance) { cube = (CubeInstance) olapContext.realization; ProjectInstance project = cube.getProjectInstance(); List realizationEntries = project.getRealizationEntries(); String lookupTableName = olapContext.firstTableScan.getTableName(); CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig()); cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName); olapContext.realization = cube; } {code} > Query with lookup table always use latest snapshot > -- > > Key: KYLIN-3628 > URL: https://issues.apache.org/jira/browse/KYLIN-3628 > Project: Kylin > Issue Type: Improvement >Reporter: Na Zhai >Assignee: Na Zhai >Priority: Major > Fix For: v2.6.0 > > > If user queries a lookup table, Kylin will randomly selects a Cube (which has > the snapshot of this lookup table) to answer it. This causes uncertainty when > there are multiple cubes (share the same lookup): some cubes are newly built, > some not. If Kylin picks an old cube, the query result is old. > To remove this uncertainty, for such queries, either always use latest > snapshot, or use earlist snapshot. We believe the "latest" version is better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)