How to estimate resource cost according to data scale?

Chase Zhang Tue, 14 Nov 2017 00:34:40 -0800

Hi all,

This is Chase from Strikingly. Recently we're confronted with one problem upon 
the usage of Apache Kylin. Here is the description. Hoping anyone here could 
give some suggestions :)


The problem is about the estimation of resource and time cost for one build of 
cube in proportion to data scale.

Currently we have a task which will be triggered once per hour and the cube 
build will averagely cost 7-10 minutes or so. Per our business's growth, we 
need to plan an up scaling for our data platform in case the build time becomes 
too long.

Thus, we're wondering if there is a good way to forecast the resource required 
to keep the same task's build time under 20 minutes if the data scale has 
enlarged, for example, 100 times. As we are not familiar to the underlying 
algorithm of Kylin, we're not sure how will Kylin actually perform upon our 
dataset.

Do the develop team and other users in community have any experience or 
suggestions for this? Is there any articles for this specific problem?

How to estimate resource cost according to data scale?

Reply via email to