[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-07-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154356#comment-17154356
 ] 

ASF subversion and git services commented on KYLIN-4342:


Commit f9ef8c699920b0d98fc2ad7a310a3b44738c883f in kylin's branch 
refs/heads/master from Zhong, Yanghong
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=f9ef8c6 ]

KYLIN-4342 Fix incorrect database


> Build Global Dict by MR/Hive New Version
> 
>
> Key: KYLIN-4342
> URL: https://issues.apache.org/jira/browse/KYLIN-4342
> Project: Kylin
>  Issue Type: Improvement
>Reporter: wangxiaojing
>Assignee: wangxiaojing
>Priority: Major
> Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-07-09 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154352#comment-17154352
 ] 

ASF subversion and git services commented on KYLIN-4342:


Commit f9ef8c699920b0d98fc2ad7a310a3b44738c883f in kylin's branch 
refs/heads/master from Zhong, Yanghong
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=f9ef8c6 ]

KYLIN-4342 Fix incorrect database


> Build Global Dict by MR/Hive New Version
> 
>
> Key: KYLIN-4342
> URL: https://issues.apache.org/jira/browse/KYLIN-4342
> Project: Kylin
>  Issue Type: Improvement
>Reporter: wangxiaojing
>Assignee: wangxiaojing
>Priority: Major
> Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-05-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120286#comment-17120286
 ] 

ASF GitHub Bot commented on KYLIN-4342:
---

hit-lacus closed pull request #1097:
URL: https://github.com/apache/kylin/pull/1097


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build Global Dict by MR/Hive New Version
> 
>
> Key: KYLIN-4342
> URL: https://issues.apache.org/jira/browse/KYLIN-4342
> Project: Kylin
>  Issue Type: Improvement
>Reporter: wangxiaojing
>Assignee: wangxiaojing
>Priority: Major
> Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-05-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120285#comment-17120285
 ] 

ASF GitHub Bot commented on KYLIN-4342:
---

hit-lacus commented on pull request #1097:
URL: https://github.com/apache/kylin/pull/1097#issuecomment-636348553


   Close this because it is introduced in 
https://github.com/apache/kylin/pull/1207 .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build Global Dict by MR/Hive New Version
> 
>
> Key: KYLIN-4342
> URL: https://issues.apache.org/jira/browse/KYLIN-4342
> Project: Kylin
>  Issue Type: Improvement
>Reporter: wangxiaojing
>Assignee: wangxiaojing
>Priority: Major
> Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-05-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120189#comment-17120189
 ] 

ASF GitHub Bot commented on KYLIN-4342:
---

hit-lacus commented on pull request #1207:
URL: https://github.com/apache/kylin/pull/1207#issuecomment-636310944


   Thank you @wangxiaojing123 , let's merge it into master branch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build Global Dict by MR/Hive New Version
> 
>
> Key: KYLIN-4342
> URL: https://issues.apache.org/jira/browse/KYLIN-4342
> Project: Kylin
>  Issue Type: Improvement
>Reporter: wangxiaojing
>Assignee: wangxiaojing
>Priority: Major
> Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-05-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120199#comment-17120199
 ] 

ASF subversion and git services commented on KYLIN-4342:


Commit a2489aaf4560adf7f415629519d6e4b617967dce in kylin's branch 
refs/heads/master from wangxiaojing
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=a2489aa ]

KYLIN-4342 Build Global Dict by MR/Hive New Version,  fix
 some potential bugs, such as null pointer exceptions


> Build Global Dict by MR/Hive New Version
> 
>
> Key: KYLIN-4342
> URL: https://issues.apache.org/jira/browse/KYLIN-4342
> Project: Kylin
>  Issue Type: Improvement
>Reporter: wangxiaojing
>Assignee: wangxiaojing
>Priority: Major
> Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-05-30 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120200#comment-17120200
 ] 

ASF subversion and git services commented on KYLIN-4342:


Commit 5731f43fcf350247e76fd7e36b0980d5cf9fc912 in kylin's branch 
refs/heads/master from XiaoxiangYu
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=5731f43 ]

KYLIN-4342 Improve code smell


> Build Global Dict by MR/Hive New Version
> 
>
> Key: KYLIN-4342
> URL: https://issues.apache.org/jira/browse/KYLIN-4342
> Project: Kylin
>  Issue Type: Improvement
>Reporter: wangxiaojing
>Assignee: wangxiaojing
>Priority: Major
> Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-05-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120191#comment-17120191
 ] 

ASF GitHub Bot commented on KYLIN-4342:
---

hit-lacus merged pull request #1207:
URL: https://github.com/apache/kylin/pull/1207


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build Global Dict by MR/Hive New Version
> 
>
> Key: KYLIN-4342
> URL: https://issues.apache.org/jira/browse/KYLIN-4342
> Project: Kylin
>  Issue Type: Improvement
>Reporter: wangxiaojing
>Assignee: wangxiaojing
>Priority: Major
> Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-05-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119642#comment-17119642
 ] 

ASF GitHub Bot commented on KYLIN-4342:
---

codecov-commenter edited a comment on pull request #1207:
URL: https://github.com/apache/kylin/pull/1207#issuecomment-635917409


   # [Codecov](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=h1) Report
   > Merging 
[#1207](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=desc) into 
[master](https://codecov.io/gh/apache/kylin/commit/52edac1521b3b51d34e972b28df3d9dd462b394f=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `27.93%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/kylin/pull/1207/graphs/tree.svg?width=650=150=pr=JawVgbgsVo)](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=tree)
   
   ```diff
   @@Coverage Diff @@
   ## master#1207+/-   ##
   ==
 Coverage 25.92%   25.92%
   - Complexity 6597 6623+26 
   ==
 Files  1475 1476 +1 
 Lines 9016890394   +226 
 Branches  1258012622+42 
   ==
   + Hits  2337523439+64 
   - Misses6452164662   +141 
   - Partials   2272 2293+21 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=tree) | Coverage 
Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...pache/kylin/cache/cachemanager/CacheConstants.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y2FjaGUvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2NhY2hlL2NhY2hlbWFuYWdlci9DYWNoZUNvbnN0YW50cy5qYXZh)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../cachemanager/RemoteLocalFailOverCacheManager.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y2FjaGUvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2NhY2hlL2NhY2hlbWFuYWdlci9SZW1vdGVMb2NhbEZhaWxPdmVyQ2FjaGVNYW5hZ2VyLmphdmE=)
 | `78.57% <0.00%> (-6.05%)` | `7.00 <0.00> (ø)` | |
   | 
[...lin/rest/security/KylinAuthenticationProvider.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-c2VydmVyLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL3Jlc3Qvc2VjdXJpdHkvS3lsaW5BdXRoZW50aWNhdGlvblByb3ZpZGVyLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...rg/apache/kylin/rest/service/KylinUserService.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-c2VydmVyLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL3Jlc3Qvc2VydmljZS9LeWxpblVzZXJTZXJ2aWNlLmphdmE=)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...t/spy/memcached/protocol/TCPMemcachedNodeImpl.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y2FjaGUvc3JjL21haW4vamF2YS9uZXQvc3B5L21lbWNhY2hlZC9wcm90b2NvbC9UQ1BNZW1jYWNoZWROb2RlSW1wbC5qYXZh)
 | `25.65% <25.65%> (ø)` | `23.00 <23.00> (?)` | |
   | 
[...ylin/cache/cachemanager/MemcachedCacheManager.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y2FjaGUvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2NhY2hlL2NhY2hlbWFuYWdlci9NZW1jYWNoZWRDYWNoZU1hbmFnZXIuamF2YQ==)
 | `46.03% <100.00%> (+2.69%)` | `7.00 <0.00> (ø)` | |
   | 
[...g/apache/kylin/cache/memcached/MemcachedCache.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y2FjaGUvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2NhY2hlL21lbWNhY2hlZC9NZW1jYWNoZWRDYWNoZS5qYXZh)
 | `49.10% <100.00%> (+1.89%)` | `23.00 <3.00> (+3.00)` | |
   | 
[...org/apache/kylin/rest/util/QueryRequestLimits.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-c2VydmVyLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL3Jlc3QvdXRpbC9RdWVyeVJlcXVlc3RMaW1pdHMuamF2YQ==)
 | `35.71% <0.00%> (-4.77%)` | `5.00% <0.00%> (-1.00%)` | |
   | 
[.../apache/kylin/cube/cuboid/TreeCuboidScheduler.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1jdWJlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9jdWJlL2N1Ym9pZC9UcmVlQ3Vib2lkU2NoZWR1bGVyLmphdmE=)
 | `63.84% <0.00%> (-2.31%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...a/org/apache/kylin/dict/Number2BytesConverter.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1kaWN0aW9uYXJ5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9kaWN0L051bWJlcjJCeXRlc0NvbnZlcnRlci5qYXZh)
 | `81.74% <0.00%> (-0.80%)` | `17.00% <0.00%> (-1.00%)` | |
   | ... and [4 
more](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 

[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-05-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119500#comment-17119500
 ] 

ASF GitHub Bot commented on KYLIN-4342:
---

lgtm-com[bot] commented on pull request #1207:
URL: https://github.com/apache/kylin/pull/1207#issuecomment-635918529


   This pull request **fixes 1 alert** when merging 
a3da3c9630f6c8c04a91fd416c5118ac892d6b16 into 
ead437ab41020aa47bddb067b566cdc874dfa286 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/kylin/rev/pr-20d691ee5af73eb79d3dadde5deb498d04e01a03)
   
   **fixed alerts:**
   
   * 1 for Boxed variable is never null



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build Global Dict by MR/Hive New Version
> 
>
> Key: KYLIN-4342
> URL: https://issues.apache.org/jira/browse/KYLIN-4342
> Project: Kylin
>  Issue Type: Improvement
>Reporter: wangxiaojing
>Assignee: wangxiaojing
>Priority: Major
> Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-05-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119498#comment-17119498
 ] 

ASF GitHub Bot commented on KYLIN-4342:
---

codecov-commenter commented on pull request #1207:
URL: https://github.com/apache/kylin/pull/1207#issuecomment-635917409


   # [Codecov](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=h1) Report
   > Merging 
[#1207](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=desc) into 
[master](https://codecov.io/gh/apache/kylin/commit/52edac1521b3b51d34e972b28df3d9dd462b394f=desc)
 will **decrease** coverage by `0.15%`.
   > The diff coverage is `0.73%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/kylin/pull/1207/graphs/tree.svg?width=650=150=pr=JawVgbgsVo)](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1207  +/-   ##
   
   - Coverage 25.92%   25.77%   -0.16% 
   - Complexity 6597 6622  +25 
   
 Files  1475 1482   +7 
 Lines 9016890949 +781 
 Branches  1258012687 +107 
   
   + Hits  2337523441  +66 
   - Misses6452165215 +694 
   - Partials   2272 2293  +21 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=tree) | Coverage 
Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...n/job/lock/zookeeper/ZookeeperDistributedLock.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1qb2Ivc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2pvYi9sb2NrL3pvb2tlZXBlci9ab29rZWVwZXJEaXN0cmlidXRlZExvY2suamF2YQ==)
 | `49.35% <0.00%> (ø)` | `18.00 <0.00> (ø)` | |
   | 
[...in/measure/bitmap/BitmapIntersectValueAggFunc.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iaXRtYXAvQml0bWFwSW50ZXJzZWN0VmFsdWVBZ2dGdW5jLmphdmE=)
 | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...apache/kylin/measure/bitmap/BitmapMeasureType.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iaXRtYXAvQml0bWFwTWVhc3VyZVR5cGUuamF2YQ==)
 | `18.51% <0.00%> (ø)` | `4.00 <0.00> (ø)` | |
   | 
[.../org/apache/kylin/metadata/model/FunctionDesc.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWV0YWRhdGEvbW9kZWwvRnVuY3Rpb25EZXNjLmphdmE=)
 | `23.37% <ø> (ø)` | `18.00 <0.00> (ø)` | |
   | 
[...apache/kylin/engine/mr/BatchCubingJobBuilder2.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvQmF0Y2hDdWJpbmdKb2JCdWlsZGVyMi5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[.../org/apache/kylin/engine/mr/JobBuilderSupport.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvSm9iQnVpbGRlclN1cHBvcnQuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...ache/kylin/engine/mr/common/BaseCuboidBuilder.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvY29tbW9uL0Jhc2VDdWJvaWRCdWlsZGVyLmphdmE=)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...gine/mr/steps/BuildGlobalHiveDictPartBuildJob.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvc3RlcHMvQnVpbGRHbG9iYWxIaXZlRGljdFBhcnRCdWlsZEpvYi5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...e/mr/steps/BuildGlobalHiveDictPartBuildMapper.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvc3RlcHMvQnVpbGRHbG9iYWxIaXZlRGljdFBhcnRCdWlsZE1hcHBlci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[.../mr/steps/BuildGlobalHiveDictPartBuildReducer.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvc3RlcHMvQnVpbGRHbG9iYWxIaXZlRGljdFBhcnRCdWlsZFJlZHVjZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | |
   | ... and [26 
more](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree-more) | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > 

[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version

2020-01-18 Thread Xiaoxiang Yu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018765#comment-17018765
 ] 

Xiaoxiang Yu commented on KYLIN-4342:
-

Great feature for apache kylin!

> Build Global Dict by MR/Hive New Version
> 
>
> Key: KYLIN-4342
> URL: https://issues.apache.org/jira/browse/KYLIN-4342
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: wangxiaojing
>Assignee: wangxiaojing
>Priority: Major
>
> At present, there are two limitations and some distributed concurrency lock 
> bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory 
> and build time becomes uncontrollable with data volume reaching billion 
> level. We have tested the base of 800 million level to configure 15g memory, 
> and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is 
> the same as the previous Mr / Hive implementation, that is, to complete 
> global dictionary coding through Hive or MR, and then replace the original 
> value in the flat table with the dictionary encoded value.[Mr /Hive 
> V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel 
> total build" two steps by mr to replace the original "build dict" step, so as 
> to solve the above two limitations.And use ZK to solve the distributed 
> concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)