Hi all, This is Chase from Strikingly. Recently we're confronted with one problem upon the usage of Apache Kylin. Here is the description. Hoping anyone here could give some suggestions :)
The problem is about the estimation of resource and time cost for one build of cube in proportion to data scale. Currently we have a task which will be triggered once per hour and the cube build will averagely cost 7-10 minutes or so. Per our business's growth, we need to plan an up scaling for our data platform in case the build time becomes too long. Thus, we're wondering if there is a good way to forecast the resource required to keep the same task's build time under 20 minutes if the data scale has enlarged, for example, 100 times. As we are not familiar to the underlying algorithm of Kylin, we're not sure how will Kylin actually perform upon our dataset. Do the develop team and other users in community have any experience or suggestions for this? Is there any articles for this specific problem?