Hi, George: In kylin.properties, I found the following two parameters close to what you were suggesting:
## The cut size for hbase region, in GB. kylin.storage.hbase.region-cut-gb=2 # ## The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster. ## Set 0 to disable this optimization. kylin.storage.hbase.hfile-size-gb=1 I am wondering what is the difference between them, and are they both in float number type and can be set to, say, 0.2? Thanks. Kang-sen From: [email protected] <[email protected]> On Behalf Of nichunen Sent: Tuesday, July 9, 2019 11:43 PM To: [email protected] Subject: Re: question. How to speed up convert cuboid to HFILE step. ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi Kang-sen, Yes, sorry for my typo, I mean the mr configs. The number of reduce tasks in step “convert cuboid to HFILE step” is close to the region count of cube's hbase table. So I suggest you reduce the config of kylin.storage.hbase.region-cut-gb to a smaller number, it can be a float number, I think this will increase the reduce tasks’ number for this step. Best regards, Ni Chunen / George On 07/8/2019 21:51,Lu, Kang-Sen<[email protected]><mailto:[email protected]> wrote: Hi, George: Thanks for your reply. I am not sure exactly how to change kylin config to improve the step converting cuboid to HFILE. Do you mind point me to the document so that I know exact which parameter to modify. In addition, do you really mean to adjust “kylin’s hive config”? The file for that should be kylin_hive_conf.xml, not kylin_job_conf.xml. But I’d rather believe the answer is in kylin_job_conf.xml, because it is likely mapreduce config that may help to improve the performance. Kang-sen From: [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> On Behalf Of nichunen Sent: Thursday, July 4, 2019 10:25 PM To: [email protected]<mailto:[email protected]> Subject: Re:question. How to speed up convert cuboid to HFILE step. ________________________________ NOTICE: This email was received from an EXTERNAL sender ________________________________ Hi Kang-sen, You can adjust the configuration in Kylin's hive configuration file ($KYLIN_HOME/conf/kylin_job_conf.xml) to speed up the MR jobs. Best regards, Ni Chunen / George On 04/12/2019 04:59,Lu, Kang-Sen<[email protected]><mailto:[email protected]> wrote: I am running kylin 2.5.1. When I build one hour’s cuboids, the step converting cuboid to HFILE took 8.33 minutes. Only 1 reduce task was created. Is there way to start more reduce tasks? The following is my kylin.properties file content: #kylin.storage.hbase.region-cut-gb=5 kylin.storage.hbase.hfile-size-gb=1 Any suggestion is welcome. Thanks. Kang-sen Log from kylin monitor step 13: (The data size is 1.26GB) Counters: 50 File System Counters FILE: Number of bytes read=966603663 FILE: Number of bytes written=1996309058 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=662733811 HDFS: Number of bytes written=1350608338 HDFS: Number of read operations=199 HDFS: Number of large read operations=0 HDFS: Number of write operations=5 Job Counters Launched map tasks=48 Launched reduce tasks=1 Data-local map tasks=45 Rack-local map tasks=3 Total time spent by all maps in occupied slots (ms)=799000 Total time spent by all reduces in occupied slots (ms)=822064 Total time spent by all map tasks (ms)=799000 Total time spent by all reduce tasks (ms)=411032 Total vcore-milliseconds taken by all map tasks=799000 Total vcore-milliseconds taken by all reduce tasks=411032 Total megabyte-milliseconds taken by all map tasks=8999936000 Total megabyte-milliseconds taken by all reduce tasks=9259728896 Map-Reduce Framework Map input records=28693452 Map output records=57386904 Map output bytes=6215636946 Map output materialized bytes=1020541171 Input split bytes=11748 Combine input records=0 Combine output records=0 Reduce input groups=57386904 Reduce shuffle bytes=1020541171 Reduce input records=57386904 Reduce output records=57386904 Spilled Records=114773808 Shuffled Maps =48 Failed Shuffles=0 Merged Map outputs=48 GC time elapsed (ms)=47218 CPU time spent (ms)=1278720 Physical memory (bytes) snapshot=130269708288 Virtual memory (bytes) snapshot=585895706624 Total committed heap usage (bytes)=149828927488 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=662722063 File Output Format Counters Bytes Written=1350608338 ________________________________ Notice: This e-mail together with any attachments may contain information of Ribbon Communications Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments. ________________________________
