Hi hongyue,
Thank you for sharing the experience in your use case. I am glad to hear
that we solved the problem to some extent by our effort and collaboration.
The Question 3 is really interesting, but finding a real smart solution
maybe difficult(and maybe error-prone), wish someone could have a better idea
in the future.
--
Best wishes to you !
From :Xiaoxiang Yu
At 2020-01-21 15:21:58, "毛洪玥" <[email protected]> wrote:
Hi all,
Recently we has gone live "Force Hit Cube or Hybrid" feature both back-end
and front-end based on issue KYLIN-4312 which solved by @Xiaoxiang Yu, it will
be available in next release according to the plan. And we got some questions
as below:
[Background]
After patch applied, Kylin website/UI looks like pic1(with a Drop-down box
in "Insight" page to let user choose the cube for their query):
There were two main use cases for this feature in our company:
1. Force choose the cheapest cube. In our team, we choose to build some
smaller cubes other than SINGLE larger cube to reduce build duration/cube
storage. For example, we build three small cubes: the first cube with three
dimensions "ABC", the second cube with three dimensions "ADE", and the third
cube with five dimensions “ADHGF", rather than a bigger cube with eight
dimensions "ABCDEHGF". We can see, because of the removal of cuboid "ABCDEHGF",
our design will reduce total storage a lot in theory (however it depends on
specific use scenarios). After that, the design we choose will cause new
question. UserA create and build Cube1(with three dimensions A,B,C) from
2020.01.07 to now, and UserB create and build Cube2 (with four dimensions
A,D,H,G,F) from 2020.01.05 to now. When UserB querying "select A,count(*) from
db.table group by A; " , this query will hit Cube1, because of less
dimension/measure, so that result from 01.05 to 01.07 will disappear. To fix
this problem, we have to force choose Cube2 to answer this query.
2. For testing and debug purpose. We usually clone new cube from
existing one, make some changes(maybe add some new configuration) and then
build some new segment for testing new added feature. But it will cause cube
conflict when two cube both become READY, thus leads to wrong online
results(maybe misleads QA team).
[Questions]
1. Will the design we choose in use case 1 cause other problem we
didn't imagine? For example, build some smaller cube will take longer build
duration and cost more YARN resource than a single larger cube?
2. For online testing, I wonder if there exists some better solution?
3. When a Cube was chosen focrely in this way, we can’t use Kylin’s
auto cube route strategy any more, which will find the most suitable cube for
query automatically. For use case 1,if we have Cube1(with three dimensions
A,B,C) and Cube2(with four dimensions A,D,H,G,F) with the same segment, both
Cube1 and Cube2 could answer a specific Query of "select A,count(*) from
db.table where date=‘2020.01.08’ group by A", Cube2 will be chosen because we
force hit it, but unfortunately Cube1 has less dimension/measure, also maybe
has the exact-match cuboid for this query, so we’d like to choose Cube1 for
faster result rather than the Cube we force to hit. Is there a better solution
for us to find the cheaper cube with right query result?