Hi, Thanks for all the reply.
We are running Kylin 1.6.0 on Hadoop 2.7.2. The region.cut (5) and hfile.size.gb (2) was left as default. With about 230 million records of data (about 800MB of raw data), we built a cube with size 7.7GB. But in Hbase, we could only find ONE region with ONE Hfile in the table. Sorry for not providing every detail information. Because our kylin cluster is deployed in a private datacenter, which has a very high privacy policy, I could not provide details. I could not even take photo of screen. That' sad. : ( After some more investigation about the source code, we found that HBaseMRSteps.createCreateHTableStep is actually invoked in outputSide.addStepPhase2_BuildDictionary instead of outputSide.addStepPhase3_BuildCube, thanks to Billy's reply. But our problem is still there. What we observed is that in CreateHTableJob, Kylin did some estimation based on cuboid stats. In the log, we found that the estimated cuboid size did not exceed the split size, so there would be only one region. So we suspect that there may be some bias between actual cuboid size and estimated cuboid size. Maybe we could try to set kylin.job.cuboid.size.ratio and kylin.job.cuboid.size.memhungry.ratio to increase the estimated size of cuboid. Hope someone could tell me that whether I am on the right way. Thanks. On Wed, Jan 4, 2017 at 9:29 AM ShaoFeng Shi <[email protected]> wrote: Tong, could you please provide some detail information, like the Kylin/Hadoop version, model/cube description, etc. That would help us to analysis. 2017-01-03 19:59 GMT+08:00 Billy Liu <[email protected]>: The default region.cut is 5, and default hfile.size.gb is 2. What's your setting? 2017-01-03 19:33 GMT+08:00 Billy Liu <[email protected]>: Thanks Da Tong for the careful code check. But actually, both BatchCubingJobBuilder and BatchCubingJobBuilder2 will call HBaseMRSteps.createCreateHTableStep, The CreateHTableJob step will calculate the regions by split parameter. 2017-01-03 16:25 GMT+08:00 Da Tong <[email protected]>: Hi, We found that in Hadoop using mapred2 with yarn, the number of HFile created by Kylin is always 1. After some investigation, we suspect that in engine-mr, the BatchCubingJobBuilder2 works in a different way of BatchCubingJobBuilder. BatchCubingJobBuilder will invoke HBaseMRSteps.addSaveCuboidToHTableSteps, which include calculating region size. But BatchCubingJobBuilder2 invoke HBaseMRSteps.createConvertCuboidToHfileStep directly. I am not sure that this difference is by design or not. But what we see is that we got a single 16GB hfile in a single region even we set kylin.hbase.region.cut and Kylie.hbase.hfile.size.gb. -- TONG, Da / 佟达 -- Best regards, Shaofeng Shi 史少锋 -- TONG, Da / 佟达
