One region can have 1 or multiple HFiles. So the "hfile-size-gb" should be smaller than "region-cut-gb".
In the "convert to HFile" step, the number of reducers equals to the number of HFiles. Too many regions will cause memory overhead in HBase master, I think it is the same for # of hfiles. So, you can tweak the parameter, but keep in mind the drawback it may bring. Best regards, Shaofeng Shi 史少锋 Apache Kylin PMC Email: [email protected] Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html Join Kylin user mail group: [email protected] Join Kylin dev mail group: [email protected] Lu, Kang-Sen <[email protected]> 于2019年7月10日周三 下午8:00写道: > Hi, George: > > > > In kylin.properties, I found the following two parameters close to what > you were suggesting: > > > > ## The cut size for hbase region, in GB. > > kylin.storage.hbase.region-cut-gb=2 > > # > > ## The hfile size of GB, smaller hfile leading to the converting hfile MR > has more reducers and be faster. > > ## Set 0 to disable this optimization. > > kylin.storage.hbase.hfile-size-gb=1 > > > > I am wondering what is the difference between them, and are they both in > float number type and can be set to, say, 0.2? > > > > Thanks. > > > > Kang-sen > > > > *From:* [email protected] <[email protected]> *On Behalf Of *nichunen > *Sent:* Tuesday, July 9, 2019 11:43 PM > *To:* [email protected] > *Subject:* Re: question. How to speed up convert cuboid to HFILE step. > > > ------------------------------ > > NOTICE: This email was received from an EXTERNAL sender > ------------------------------ > > > > Hi Kang-sen, > > > > Yes, sorry for my typo, I mean the mr configs. > > > > The number of reduce tasks in step “convert cuboid to HFILE step” is close > to the region count of cube's hbase table. So I suggest you reduce the > config of kylin.storage.hbase.region-cut-gb to a smaller number, it can be > a float number, I think this will increase the reduce tasks’ number for > this step. > > > > > > Best regards, > > > > Ni Chunen / George > > > > On 07/8/2019 21:51,Lu, Kang-Sen<[email protected]> <[email protected]> wrote: > > Hi, George: > > > > Thanks for your reply. > > > > I am not sure exactly how to change kylin config to improve the step > converting cuboid to HFILE. Do you mind point me to the document so that I > know exact which parameter to modify. > > > > In addition, do you really mean to adjust “kylin’s hive config”? The file > for that should be kylin_hive_conf.xml, not kylin_job_conf.xml. But I’d > rather believe the answer is in kylin_job_conf.xml, because it is likely > mapreduce config that may help to improve the performance. > > > > Kang-sen > > > > *From:* [email protected] <[email protected]> *On Behalf Of *nichunen > *Sent:* Thursday, July 4, 2019 10:25 PM > *To:* [email protected] > *Subject:* Re:question. How to speed up convert cuboid to HFILE step. > > > ------------------------------ > > NOTICE: This email was received from an EXTERNAL sender > ------------------------------ > > > > Hi Kang-sen, > > > > You can adjust the configuration in Kylin's hive configuration file > ($KYLIN_HOME/conf/kylin_job_conf.xml) to speed up the MR jobs. > > > > Best regards, > > > > Ni Chunen / George > > > > On 04/12/2019 04:59,Lu, Kang-Sen<[email protected]> <[email protected]> wrote: > > I am running kylin 2.5.1. > > > > When I build one hour’s cuboids, the step converting cuboid to HFILE took > 8.33 minutes. Only 1 reduce task was created. Is there way to start more > reduce tasks? > > > > The following is my kylin.properties file content: > > > > #kylin.storage.hbase.region-cut-gb=5 > > kylin.storage.hbase.hfile-size-gb=1 > > > > Any suggestion is welcome. > > > > Thanks. > > > > Kang-sen > > > > Log from kylin monitor step 13: (The data size is 1.26GB) > > > > Counters: 50 > > File System Counters > > FILE: Number of bytes read=966603663 > > FILE: Number of bytes written=1996309058 > > FILE: Number of read operations=0 > > FILE: Number of large read operations=0 > > FILE: Number of write operations=0 > > HDFS: Number of bytes read=662733811 > > HDFS: Number of bytes written=1350608338 > > HDFS: Number of read operations=199 > > HDFS: Number of large read operations=0 > > HDFS: Number of write operations=5 > > Job Counters > > Launched map tasks=48 > > Launched reduce tasks=1 > > Data-local map tasks=45 > > Rack-local map tasks=3 > > Total time spent by all maps in occupied slots (ms)=799000 > > Total time spent by all reduces in occupied slots > (ms)=822064 > > Total time spent by all map tasks (ms)=799000 > > Total time spent by all reduce tasks (ms)=411032 > > Total vcore-milliseconds taken by all map tasks=799000 > > Total vcore-milliseconds taken by all reduce tasks=411032 > > Total megabyte-milliseconds taken by all map > tasks=8999936000 > > Total megabyte-milliseconds taken by all reduce > tasks=9259728896 > > Map-Reduce Framework > > Map input records=28693452 > > Map output records=57386904 > > Map output bytes=6215636946 > > Map output materialized bytes=1020541171 > > Input split bytes=11748 > > Combine input records=0 > > Combine output records=0 > > Reduce input groups=57386904 > > Reduce shuffle bytes=1020541171 > > Reduce input records=57386904 > > Reduce output records=57386904 > > Spilled Records=114773808 > > Shuffled Maps =48 > > Failed Shuffles=0 > > Merged Map outputs=48 > > GC time elapsed (ms)=47218 > > CPU time spent (ms)=1278720 > > Physical memory (bytes) snapshot=130269708288 > > Virtual memory (bytes) snapshot=585895706624 > > Total committed heap usage (bytes)=149828927488 > > Shuffle Errors > > BAD_ID=0 > > CONNECTION=0 > > IO_ERROR=0 > > WRONG_LENGTH=0 > > WRONG_MAP=0 > > WRONG_REDUCE=0 > > File Input Format Counters > > Bytes Read=662722063 > > File Output Format Counters > > Bytes Written=1350608338 > > > > > ------------------------------ > > Notice: This e-mail together with any attachments may contain information > of Ribbon Communications Inc. that is confidential and/or proprietary for > the sole use of the intended recipient. Any review, disclosure, reliance or > distribution by others or forwarding without express permission is strictly > prohibited. If you are not the intended recipient, please notify the sender > immediately and then delete all copies, including any attachments. > ------------------------------ > >
