One region can have 1 or multiple HFiles. So the "hfile-size-gb" should be
smaller than "region-cut-gb".

In the "convert to HFile" step, the number of reducers equals to the number
of HFiles.

Too many regions will cause memory overhead in HBase master, I think it is
the same for # of hfiles. So, you can tweak the parameter, but keep in mind
the drawback it may bring.



Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: [email protected]

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]




Lu, Kang-Sen <[email protected]> 于2019年7月10日周三 下午8:00写道:

> Hi, George:
>
>
>
> In kylin.properties, I found the following two parameters close to what
> you were suggesting:
>
>
>
> ## The cut size for hbase region, in GB.
>
> kylin.storage.hbase.region-cut-gb=2
>
> #
>
> ## The hfile size of GB, smaller hfile leading to the converting hfile MR
> has more reducers and be faster.
>
> ## Set 0 to disable this optimization.
>
> kylin.storage.hbase.hfile-size-gb=1
>
>
>
> I am wondering what is the difference between them, and are they both in
> float number type and can be set to, say, 0.2?
>
>
>
> Thanks.
>
>
>
> Kang-sen
>
>
>
> *From:* [email protected] <[email protected]> *On Behalf Of *nichunen
> *Sent:* Tuesday, July 9, 2019 11:43 PM
> *To:* [email protected]
> *Subject:* Re: question. How to speed up convert cuboid to HFILE step.
>
>
> ------------------------------
>
> NOTICE: This email was received from an EXTERNAL sender
> ------------------------------
>
>
>
> Hi Kang-sen,
>
>
>
> Yes, sorry for my typo, I mean the mr configs.
>
>
>
> The number of reduce tasks in step “convert cuboid to HFILE step” is close
> to the region count of cube's hbase table. So I suggest you reduce the
> config of kylin.storage.hbase.region-cut-gb to a smaller number, it can be
> a float number, I think this will increase the reduce tasks’ number for
> this step.
>
>
>
>
>
> Best regards,
>
>
>
> Ni Chunen / George
>
>
>
> On 07/8/2019 21:51,Lu, Kang-Sen<[email protected]> <[email protected]> wrote:
>
> Hi, George:
>
>
>
> Thanks for your reply.
>
>
>
> I am not sure exactly how to change kylin config to improve the step
> converting cuboid to HFILE. Do you mind point me to the document so that I
> know exact which parameter to modify.
>
>
>
> In addition, do you really mean to adjust “kylin’s hive config”? The file
> for that should be kylin_hive_conf.xml, not kylin_job_conf.xml. But I’d
> rather believe the answer is in kylin_job_conf.xml, because it is likely
> mapreduce config that may help to improve the performance.
>
>
>
> Kang-sen
>
>
>
> *From:* [email protected] <[email protected]> *On Behalf Of *nichunen
> *Sent:* Thursday, July 4, 2019 10:25 PM
> *To:* [email protected]
> *Subject:* Re:question. How to speed up convert cuboid to HFILE step.
>
>
> ------------------------------
>
> NOTICE: This email was received from an EXTERNAL sender
> ------------------------------
>
>
>
> Hi Kang-sen,
>
>
>
> You can adjust the configuration in Kylin's hive configuration file
> ($KYLIN_HOME/conf/kylin_job_conf.xml)  to speed up the MR jobs.
>
>
>
> Best regards,
>
>
>
> Ni Chunen / George
>
>
>
> On 04/12/2019 04:59,Lu, Kang-Sen<[email protected]> <[email protected]> wrote:
>
> I am running kylin 2.5.1.
>
>
>
> When I build one hour’s cuboids, the step converting cuboid to HFILE took
> 8.33 minutes. Only 1 reduce task was created. Is there way to start more
> reduce tasks?
>
>
>
> The following is my kylin.properties file content:
>
>
>
> #kylin.storage.hbase.region-cut-gb=5
>
> kylin.storage.hbase.hfile-size-gb=1
>
>
>
> Any suggestion is welcome.
>
>
>
> Thanks.
>
>
>
> Kang-sen
>
>
>
> Log from kylin monitor step 13: (The data size is 1.26GB)
>
>
>
> Counters: 50
>
>         File System Counters
>
>                FILE: Number of bytes read=966603663
>
>                FILE: Number of bytes written=1996309058
>
>                FILE: Number of read operations=0
>
>                FILE: Number of large read operations=0
>
>                FILE: Number of write operations=0
>
>                HDFS: Number of bytes read=662733811
>
>                HDFS: Number of bytes written=1350608338
>
>                HDFS: Number of read operations=199
>
>                HDFS: Number of large read operations=0
>
>                HDFS: Number of write operations=5
>
>         Job Counters
>
>                Launched map tasks=48
>
>                Launched reduce tasks=1
>
>                Data-local map tasks=45
>
>                Rack-local map tasks=3
>
>                Total time spent by all maps in occupied slots (ms)=799000
>
>                Total time spent by all reduces in occupied slots
> (ms)=822064
>
>                Total time spent by all map tasks (ms)=799000
>
>                Total time spent by all reduce tasks (ms)=411032
>
>                Total vcore-milliseconds taken by all map tasks=799000
>
>                Total vcore-milliseconds taken by all reduce tasks=411032
>
>                Total megabyte-milliseconds taken by all map
> tasks=8999936000
>
>                Total megabyte-milliseconds taken by all reduce
> tasks=9259728896
>
>         Map-Reduce Framework
>
>                Map input records=28693452
>
>                Map output records=57386904
>
>                Map output bytes=6215636946
>
>                Map output materialized bytes=1020541171
>
>                Input split bytes=11748
>
>                Combine input records=0
>
>                Combine output records=0
>
>                Reduce input groups=57386904
>
>                Reduce shuffle bytes=1020541171
>
>                Reduce input records=57386904
>
>                Reduce output records=57386904
>
>                Spilled Records=114773808
>
>                Shuffled Maps =48
>
>                Failed Shuffles=0
>
>                Merged Map outputs=48
>
>                GC time elapsed (ms)=47218
>
>                CPU time spent (ms)=1278720
>
>                Physical memory (bytes) snapshot=130269708288
>
>                Virtual memory (bytes) snapshot=585895706624
>
>                Total committed heap usage (bytes)=149828927488
>
>         Shuffle Errors
>
>                BAD_ID=0
>
>                CONNECTION=0
>
>                IO_ERROR=0
>
>                WRONG_LENGTH=0
>
>                WRONG_MAP=0
>
>                WRONG_REDUCE=0
>
>         File Input Format Counters
>
>                Bytes Read=662722063
>
>         File Output Format Counters
>
>                Bytes Written=1350608338
>
>
>
>
> ------------------------------
>
> Notice: This e-mail together with any attachments may contain information
> of Ribbon Communications Inc. that is confidential and/or proprietary for
> the sole use of the intended recipient. Any review, disclosure, reliance or
> distribution by others or forwarding without express permission is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately and then delete all copies, including any attachments.
> ------------------------------
>
>

Reply via email to