Hi, George:

In kylin.properties, I found the following two parameters close to what you 
were suggesting:

## The cut size for hbase region, in GB.
kylin.storage.hbase.region-cut-gb=2
#
## The hfile size of GB, smaller hfile leading to the converting hfile MR has 
more reducers and be faster.
## Set 0 to disable this optimization.
kylin.storage.hbase.hfile-size-gb=1

I am wondering what is the difference between them, and are they both in float 
number type and can be set to, say, 0.2?

Thanks.

Kang-sen

From: [email protected] <[email protected]> On Behalf Of nichunen
Sent: Tuesday, July 9, 2019 11:43 PM
To: [email protected]
Subject: Re: question. How to speed up convert cuboid to HFILE step.

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi Kang-sen,

Yes, sorry for my typo, I mean the mr configs.

The number of reduce tasks in step “convert cuboid to HFILE step” is close to 
the region count of cube's hbase table. So I suggest you reduce the config of 
kylin.storage.hbase.region-cut-gb to a smaller number, it can be a float 
number, I think this will increase the reduce tasks’ number for this step.


Best regards,

Ni Chunen / George

On 07/8/2019 21:51,Lu, Kang-Sen<[email protected]><mailto:[email protected]> wrote:
Hi, George:

Thanks for your reply.

I am not sure exactly how to change kylin config to improve the step converting 
cuboid to HFILE. Do you mind point me to the document so that I know exact 
which parameter to modify.

In addition, do you really mean to adjust “kylin’s hive config”? The file for 
that should be kylin_hive_conf.xml, not kylin_job_conf.xml. But I’d rather 
believe the answer is in kylin_job_conf.xml, because it is likely mapreduce 
config that may help to improve the performance.

Kang-sen

From: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>> On Behalf Of nichunen
Sent: Thursday, July 4, 2019 10:25 PM
To: [email protected]<mailto:[email protected]>
Subject: Re:question. How to speed up convert cuboid to HFILE step.

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi Kang-sen,

You can adjust the configuration in Kylin's hive configuration file 
($KYLIN_HOME/conf/kylin_job_conf.xml)  to speed up the MR jobs.

Best regards,

Ni Chunen / George

On 04/12/2019 04:59,Lu, Kang-Sen<[email protected]><mailto:[email protected]> wrote:
I am running kylin 2.5.1.

When I build one hour’s cuboids, the step converting cuboid to HFILE took 8.33 
minutes. Only 1 reduce task was created. Is there way to start more reduce 
tasks?

The following is my kylin.properties file content:

#kylin.storage.hbase.region-cut-gb=5
kylin.storage.hbase.hfile-size-gb=1

Any suggestion is welcome.

Thanks.

Kang-sen

Log from kylin monitor step 13: (The data size is 1.26GB)

Counters: 50
        File System Counters
               FILE: Number of bytes read=966603663
               FILE: Number of bytes written=1996309058
               FILE: Number of read operations=0
               FILE: Number of large read operations=0
               FILE: Number of write operations=0
               HDFS: Number of bytes read=662733811
               HDFS: Number of bytes written=1350608338
               HDFS: Number of read operations=199
               HDFS: Number of large read operations=0
               HDFS: Number of write operations=5
        Job Counters
               Launched map tasks=48
               Launched reduce tasks=1
               Data-local map tasks=45
               Rack-local map tasks=3
               Total time spent by all maps in occupied slots (ms)=799000
               Total time spent by all reduces in occupied slots (ms)=822064
               Total time spent by all map tasks (ms)=799000
               Total time spent by all reduce tasks (ms)=411032
               Total vcore-milliseconds taken by all map tasks=799000
               Total vcore-milliseconds taken by all reduce tasks=411032
               Total megabyte-milliseconds taken by all map tasks=8999936000
               Total megabyte-milliseconds taken by all reduce tasks=9259728896
        Map-Reduce Framework
               Map input records=28693452
               Map output records=57386904
               Map output bytes=6215636946
               Map output materialized bytes=1020541171
               Input split bytes=11748
               Combine input records=0
               Combine output records=0
               Reduce input groups=57386904
               Reduce shuffle bytes=1020541171
               Reduce input records=57386904
               Reduce output records=57386904
               Spilled Records=114773808
               Shuffled Maps =48
               Failed Shuffles=0
               Merged Map outputs=48
               GC time elapsed (ms)=47218
               CPU time spent (ms)=1278720
               Physical memory (bytes) snapshot=130269708288
               Virtual memory (bytes) snapshot=585895706624
               Total committed heap usage (bytes)=149828927488
        Shuffle Errors
               BAD_ID=0
               CONNECTION=0
               IO_ERROR=0
               WRONG_LENGTH=0
               WRONG_MAP=0
               WRONG_REDUCE=0
        File Input Format Counters
               Bytes Read=662722063
        File Output Format Counters
               Bytes Written=1350608338


________________________________
Notice: This e-mail together with any attachments may contain information of 
Ribbon Communications Inc. that is confidential and/or proprietary for the sole 
use of the intended recipient. Any review, disclosure, reliance or distribution 
by others or forwarding without express permission is strictly prohibited. If 
you are not the intended recipient, please notify the sender immediately and 
then delete all copies, including any attachments.
________________________________

Reply via email to