Hi,

Thanks for all the reply.

We are running Kylin 1.6.0 on Hadoop 2.7.2. The region.cut (5) and
hfile.size.gb (2) was left as default. With about 230 million records of
data (about 800MB of raw data), we built a cube with size 7.7GB. But in
Hbase, we could only find ONE region with ONE Hfile in the table.

Sorry for not providing every detail information. Because our kylin cluster
is deployed in a private datacenter, which has a very high privacy policy,
I could not provide details. I could not even take photo of screen. That'
sad. : (

After some more investigation about the source code, we found that
HBaseMRSteps.createCreateHTableStep is actually invoked
in outputSide.addStepPhase2_BuildDictionary instead
of outputSide.addStepPhase3_BuildCube, thanks to Billy's reply.

But our problem is still there. What we observed is that
in CreateHTableJob, Kylin did some estimation based on cuboid stats. In the
log, we found that the estimated cuboid size did not exceed the split size,
so there would be only one region. So we suspect that there may be some
bias between actual cuboid size and estimated cuboid size. Maybe we could
try to set kylin.job.cuboid.size.ratio
and kylin.job.cuboid.size.memhungry.ratio to increase the estimated size of
cuboid. Hope someone could tell me that whether I am on the right way.

Thanks.

On Wed, Jan 4, 2017 at 9:29 AM ShaoFeng Shi <[email protected]> wrote:

Tong, could you please provide some detail information, like the
Kylin/Hadoop version, model/cube description, etc. That would help us to
analysis.

2017-01-03 19:59 GMT+08:00 Billy Liu <[email protected]>:

The default region.cut is 5, and default hfile.size.gb is 2. What's your
setting?

2017-01-03 19:33 GMT+08:00 Billy Liu <[email protected]>:

Thanks Da Tong for the careful code check.
But actually, both BatchCubingJobBuilder and BatchCubingJobBuilder2 will
call HBaseMRSteps.createCreateHTableStep, The CreateHTableJob step will
calculate the regions by split parameter.

2017-01-03 16:25 GMT+08:00 Da Tong <[email protected]>:

Hi,

We found that in Hadoop using mapred2 with yarn, the number of HFile
created by Kylin is always 1. After some investigation, we suspect that in
engine-mr, the BatchCubingJobBuilder2 works in a different way of
BatchCubingJobBuilder. BatchCubingJobBuilder   will invoke
HBaseMRSteps.addSaveCuboidToHTableSteps, which include calculating region
size. But BatchCubingJobBuilder2 invoke
HBaseMRSteps.createConvertCuboidToHfileStep directly.
I am not sure that this difference is by design or not. But what we see is
that we got a single 16GB hfile in a single region even we set










kylin.hbase.region.cut and Kylie.hbase.hfile.size.gb.



-- 
TONG, Da / 佟达






-- 
Best regards,

Shaofeng Shi 史少锋

-- 
TONG, Da / 佟达

Reply via email to