[jira] [Updated] (KYLIN-3423) Performance improvement in FactDistinctColumnsMapper
[ https://issues.apache.org/jira/browse/KYLIN-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyang updated KYLIN-3423: -- Description: Currently FactDistinctColumnsMapper writes every cell to mapper output. In spite of mapper side Combiner, we could do better de-dup using available mapper memory. The situation becomes worse after KYLIN-3370, because not only dictionary columns, now it is every dimension column get written as mapper output. Suggest * For non-dictionary dimension column, only write min/max value to mapper output. was: Currently FactDistinctColumnsMapper writes every cell to mapper output. In spite of mapper side Combiner, we could do better de-dup using available mapper memory. The situation becomes worse after KYLIN-3370, because not only dictionary columns, now it is every dimension column get written as mapper output. Suggest * Use available mapper memory to de-dup before write to mapper output. * For non-dictionary dimension column, only write min/max value to mapper output. > Performance improvement in FactDistinctColumnsMapper > > > Key: KYLIN-3423 > URL: https://issues.apache.org/jira/browse/KYLIN-3423 > Project: Kylin > Issue Type: Improvement >Reporter: liyang >Assignee: Shaoxiong Zhan >Priority: Major > > Currently FactDistinctColumnsMapper writes every cell to mapper output. In > spite of mapper side Combiner, we could do better de-dup using available > mapper memory. > The situation becomes worse after KYLIN-3370, because not only dictionary > columns, now it is every dimension column get written as mapper output. > Suggest > * For non-dictionary dimension column, only write min/max value to mapper > output. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3421) Improve job scheduler fetch performance
[ https://issues.apache.org/jira/browse/KYLIN-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520060#comment-16520060 ] Zhong Yanghong commented on KYLIN-3421: --- Hi [~kangkaisen], UT & IT has been fixed. > Improve job scheduler fetch performance > --- > > Key: KYLIN-3421 > URL: https://issues.apache.org/jira/browse/KYLIN-3421 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Attachments: APACHE-KYLIN-3421.patch > > > Currently, there are several improvements we can do for job scheduler: > * For {{DistributedScheduler}}, it does not check {{isReady()}} for > checkpoint jobs > * Since the state lots of executables fetched from HBase are not READY, it's > better to fetch their output first, which can reduce lots of get requests to > hbase > * {{FetcherRunnerWithPriority}} is not enabled in {{DistributedScheduler}}, > which needs to change the lockPath from *{color:#f79232}segmentId{color}* to > *{color:#f79232}jobId{color}* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3421) Improve job scheduler fetch performance
[ https://issues.apache.org/jira/browse/KYLIN-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong updated KYLIN-3421: -- Attachment: APACHE-KYLIN-3421.patch > Improve job scheduler fetch performance > --- > > Key: KYLIN-3421 > URL: https://issues.apache.org/jira/browse/KYLIN-3421 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Attachments: APACHE-KYLIN-3421.patch > > > Currently, there are several improvements we can do for job scheduler: > * For {{DistributedScheduler}}, it does not check {{isReady()}} for > checkpoint jobs > * Since the state lots of executables fetched from HBase are not READY, it's > better to fetch their output first, which can reduce lots of get requests to > hbase > * {{FetcherRunnerWithPriority}} is not enabled in {{DistributedScheduler}}, > which needs to change the lockPath from *{color:#f79232}segmentId{color}* to > *{color:#f79232}jobId{color}* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3421) Improve job scheduler fetch performance
[ https://issues.apache.org/jira/browse/KYLIN-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong updated KYLIN-3421: -- Attachment: (was: APACHE-KYLIN-3421.patch) > Improve job scheduler fetch performance > --- > > Key: KYLIN-3421 > URL: https://issues.apache.org/jira/browse/KYLIN-3421 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Attachments: APACHE-KYLIN-3421.patch > > > Currently, there are several improvements we can do for job scheduler: > * For {{DistributedScheduler}}, it does not check {{isReady()}} for > checkpoint jobs > * Since the state lots of executables fetched from HBase are not READY, it's > better to fetch their output first, which can reduce lots of get requests to > hbase > * {{FetcherRunnerWithPriority}} is not enabled in {{DistributedScheduler}}, > which needs to change the lockPath from *{color:#f79232}segmentId{color}* to > *{color:#f79232}jobId{color}* -- This message was sent by Atlassian JIRA (v7.6.3#76005)