[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-22 Thread XIE FAN (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833450#comment-15833450
 ] 

XIE FAN commented on KYLIN-2217:


I will improve this once after I finish KYLIN-2374, thank you. 

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-21 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833346#comment-15833346
 ] 

Shaofeng SHI commented on KYLIN-2217:
-

Hi [~xiefan46], there are room to improve in the FactDistinctColumnsReducer; 
Now once "kylin.engine.mr.uhc-reducer-count" > 1, it will not build dictionary 
for every column; this is not good as usually only UHC column will be 
distributed to multiple reducers; For normal dimension, they are still using 1 
reducer, so it is okay to build here. Could you please make a further 
enhancement? thanks!

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-19 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829663#comment-15829663
 ] 

Shaofeng SHI commented on KYLIN-2217:
-

I see; then let the job engine build the dict when the values are dispatched 
into multiple reducers(files). 

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-15 Thread XIE FAN (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823176#comment-15823176
 ] 

XIE FAN commented on KYLIN-2217:


Yes, but this kind of merge is a costly procedure and it is equals to rebuild 
the whole dictionary again. If we merge dictionaries in the job node, it is the 
same as building the dictionary twice(in reducers and in the job node). 

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-15 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823131#comment-15823131
 ] 

Shaofeng SHI commented on KYLIN-2217:
-

[~xiefan46] multiple small dictionaries can be merged into a bigger one, this 
is how Kylin did in merging multiple segments; In the merge phase all values 
will be reordered.

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-14 Thread XIE FAN (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822820#comment-15822820
 ] 

XIE FAN commented on KYLIN-2217:


Dictionaries can not be merged after building bacause they have been fatten 
into a array and their structure can not be changed. And we can not merge those 
arraies altogether because we can not ensure the order between values in 
different dictionaries. But I can modify the code and let the columns that 
using multi reducers do not build dictionaries locally. And in this case, the 
dictionary-building procedure will be left to the job node.

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-14 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822812#comment-15822812
 ] 

kangkaisen commented on KYLIN-2217:
---

Building global dict with multi reducer maybe need a lot work to do.

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-13 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822638#comment-15822638
 ] 

Shaofeng SHI commented on KYLIN-2217:
-

Scan fact table twice is costly which we should avoid; I think the dictionaries 
can be merged (in job node) after building in reducers; The memory footprint of 
merge is much smaller than building, so it is acceptable for job node; will 
this be better?

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-13 Thread XIE FAN (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821788#comment-15821788
 ] 

XIE FAN commented on KYLIN-2217:


Leaving UHC dictionary building job for the job engine to build is ok, but it 
may cause a single-point bottlenect. Actually, KYLIN-2217 is designed to remove 
this bottlenect. If we want to take advantage of both KYLIN-2217 and 
KYLIN-2135, there is another way: we can scan the Fact table twice and in the 
first scan we can know the distribution of data in UHC columns. So in the 
second scan we can split values to multi reducer and ensure  the order between 
reducers base on the result of the first scan. By using this way, the conflict 
can be fixed. But it may need to modify a lot.

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2017-01-12 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821076#comment-15821076
 ] 

Shaofeng SHI commented on KYLIN-2217:
-

hi [~xiefan46] can this change be compitable with KYLIN-2135? In KYLIN-2135, a 
column's values are distributed to multiple reducers, so it might not build the 
dict in a single reducer; For that case, can Kylin leave it for job engine to 
build? Thanks!

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v2.0.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2016-12-10 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15739225#comment-15739225
 ] 

Shaofeng SHI commented on KYLIN-2217:
-

1.6.0 already released in Nov 26, should this be in 1.6.1? [~xiefan46]

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: v1.6.0
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2217) Reducers build dictionaries locally

2016-12-10 Thread hongbin ma (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737863#comment-15737863
 ] 

hongbin ma commented on KYLIN-2217:
---

hi [~xiefan46] please specify fixed version if possible

> Reducers build dictionaries locally
> ---
>
> Key: KYLIN-2217
> URL: https://issues.apache.org/jira/browse/KYLIN-2217
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.4.1
>Reporter: XIE FAN
>Assignee: XIE FAN
> Fix For: Future
>
> Attachments: 0001-KYLIN-2217-Reducers-build-dictionaries-locally.patch
>
>
> In KYLIN-1851, we reduce the peek memory usage of the dictionary-building 
> procedure by splitting a single Trie tree structure to Trie forest. But there 
> still exist a bottleneck that all the dictionaries are built in Kylin client. 
> In this issue, we want to use multi reducers to build different dictionaries 
> locally and concurrently,which can further reduce the peek memory usage as 
> well as speed up the dictionary-building procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)