Hi hongyue,
    Thank you for sharing the experience in your use case. I am glad to hear 
that we solved the problem to some extent by our effort and collaboration. 
    The Question 3 is really interesting, but finding a real smart solution 
maybe difficult(and maybe error-prone), wish someone could have a better idea 
in the future.


--

Best wishes to you ! 
From :Xiaoxiang Yu



At 2020-01-21 15:21:58, "毛洪玥" <[email protected]> wrote:



Hi all,


   Recently we has gone live "Force Hit Cube or Hybrid" feature both back-end 
and front-end based on issue KYLIN-4312 which solved by @Xiaoxiang Yu, it will 
be available in next release according to the plan. And we got some questions 
as below:


[Background]
   After patch applied, Kylin website/UI looks like pic1(with a Drop-down box 
in "Insight" page to let user choose the cube for their query):
   
   There were two main use cases for this feature in our company:
       1. Force choose the cheapest cube. In our team, we choose to build some 
smaller cubes other than SINGLE larger cube to reduce build duration/cube 
storage. For example, we build three small cubes: the first cube with three 
dimensions "ABC", the second cube with three dimensions "ADE", and the third 
cube with five dimensions “ADHGF", rather than a bigger cube with eight 
dimensions "ABCDEHGF". We can see, because of the removal of cuboid "ABCDEHGF", 
our design will reduce total storage a lot in theory (however it depends on 
specific use scenarios). After that, the design we choose will cause new 
question. UserA create and build Cube1(with three dimensions A,B,C)  from 
2020.01.07 to now, and UserB create and build Cube2 (with four dimensions 
A,D,H,G,F) from 2020.01.05 to now. When UserB querying "select A,count(*) from 
db.table group by A; " , this query will hit Cube1, because of less 
dimension/measure, so that result from 01.05 to 01.07 will disappear. To fix 
this problem, we have to force choose Cube2 to answer this query.
       2. For testing and debug purpose. We usually clone new cube from 
existing one, make some changes(maybe add some new configuration) and then 
build some new segment for testing new added feature. But it will cause cube 
conflict when two cube both become READY, thus leads to wrong online 
results(maybe misleads QA team).


[Questions]
        1. Will the design we choose in use case 1 cause other problem we 
didn't imagine? For example, build some smaller cube will take longer build 
duration and cost more YARN resource than a single larger cube? 
        2. For online testing, I wonder if there exists some better solution?
        3. When a Cube was chosen focrely in this way, we can’t use Kylin’s 
auto cube route strategy any more, which will find the most suitable cube for 
query automatically. For use case 1,if we have Cube1(with three dimensions 
A,B,C) and Cube2(with four dimensions A,D,H,G,F) with the same segment, both 
Cube1 and Cube2 could answer a specific Query of "select A,count(*) from 
db.table where date=‘2020.01.08’ group by A", Cube2 will be chosen because we 
force hit it, but unfortunately Cube1 has less  dimension/measure, also maybe 
has the exact-match cuboid for this query, so we’d like to choose Cube1 for 
faster result rather than the Cube we force to hit. Is there a better solution 
for us to find the cheaper cube with right query result?

Reply via email to