liyang created KYLIN-3423: ----------------------------- Summary: Performance improvement in FactDistinctColumnsMapper Key: KYLIN-3423 URL: https://issues.apache.org/jira/browse/KYLIN-3423 Project: Kylin Issue Type: Improvement Reporter: liyang
Currently FactDistinctColumnsMapper writes every cell to mapper output. In spite of mapper side Combiner, we could do better de-dup using available mapper memory. The situation becomes worse after KYLIN-3370, because not only dictionary columns, now it is every dimension column get written as mapper output. Suggest * Use available mapper memory to de-dup before write to mapper output. * For non-dictionary dimension column, only write min/max value to mapper output. -- This message was sent by Atlassian JIRA (v7.6.3#76005)