[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154356#comment-17154356 ] ASF subversion and git services commented on KYLIN-4342: Commit f9ef8c699920b0d98fc2ad7a310a3b44738c883f in kylin's branch refs/heads/master from Zhong, Yanghong [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=f9ef8c6 ] KYLIN-4342 Fix incorrect database > Build Global Dict by MR/Hive New Version > > > Key: KYLIN-4342 > URL: https://issues.apache.org/jira/browse/KYLIN-4342 > Project: Kylin > Issue Type: Improvement >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > Fix For: v3.1.0 > > > At present, there are two limitations and some distributed concurrency lock > bugs in the implementation of global dictionary through MR/Hive: > 1. Limited by Hive order by global sorting on the shuffle stage, the memory > and build time becomes uncontrollable with data volume reaching billion > level. We have tested the base of 800 million level to configure 15g memory, > and the build time of build dictionary needs more than 10 hours; > 2. Multi global dictionary columns is calculated serially. > 3. Some distributed concurrency lock bugs. > We have improved the original version.The general idea of the new version is > the same as the previous Mr / Hive implementation, that is, to complete > global dictionary coding through Hive or MR, and then replace the original > value in the flat table with the dictionary encoded value.[Mr /Hive > V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]] > However, in the new version, will add "parallel part build" and "parallel > total build" two steps by mr to replace the original "build dict" step, so as > to solve the above two limitations.And use ZK to solve the distributed > concurrency lock bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154352#comment-17154352 ] ASF subversion and git services commented on KYLIN-4342: Commit f9ef8c699920b0d98fc2ad7a310a3b44738c883f in kylin's branch refs/heads/master from Zhong, Yanghong [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=f9ef8c6 ] KYLIN-4342 Fix incorrect database > Build Global Dict by MR/Hive New Version > > > Key: KYLIN-4342 > URL: https://issues.apache.org/jira/browse/KYLIN-4342 > Project: Kylin > Issue Type: Improvement >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > Fix For: v3.1.0 > > > At present, there are two limitations and some distributed concurrency lock > bugs in the implementation of global dictionary through MR/Hive: > 1. Limited by Hive order by global sorting on the shuffle stage, the memory > and build time becomes uncontrollable with data volume reaching billion > level. We have tested the base of 800 million level to configure 15g memory, > and the build time of build dictionary needs more than 10 hours; > 2. Multi global dictionary columns is calculated serially. > 3. Some distributed concurrency lock bugs. > We have improved the original version.The general idea of the new version is > the same as the previous Mr / Hive implementation, that is, to complete > global dictionary coding through Hive or MR, and then replace the original > value in the flat table with the dictionary encoded value.[Mr /Hive > V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]] > However, in the new version, will add "parallel part build" and "parallel > total build" two steps by mr to replace the original "build dict" step, so as > to solve the above two limitations.And use ZK to solve the distributed > concurrency lock bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120286#comment-17120286 ] ASF GitHub Bot commented on KYLIN-4342: --- hit-lacus closed pull request #1097: URL: https://github.com/apache/kylin/pull/1097 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build Global Dict by MR/Hive New Version > > > Key: KYLIN-4342 > URL: https://issues.apache.org/jira/browse/KYLIN-4342 > Project: Kylin > Issue Type: Improvement >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > Fix For: v3.1.0 > > > At present, there are two limitations and some distributed concurrency lock > bugs in the implementation of global dictionary through MR/Hive: > 1. Limited by Hive order by global sorting on the shuffle stage, the memory > and build time becomes uncontrollable with data volume reaching billion > level. We have tested the base of 800 million level to configure 15g memory, > and the build time of build dictionary needs more than 10 hours; > 2. Multi global dictionary columns is calculated serially. > 3. Some distributed concurrency lock bugs. > We have improved the original version.The general idea of the new version is > the same as the previous Mr / Hive implementation, that is, to complete > global dictionary coding through Hive or MR, and then replace the original > value in the flat table with the dictionary encoded value.[Mr /Hive > V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]] > However, in the new version, will add "parallel part build" and "parallel > total build" two steps by mr to replace the original "build dict" step, so as > to solve the above two limitations.And use ZK to solve the distributed > concurrency lock bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120285#comment-17120285 ] ASF GitHub Bot commented on KYLIN-4342: --- hit-lacus commented on pull request #1097: URL: https://github.com/apache/kylin/pull/1097#issuecomment-636348553 Close this because it is introduced in https://github.com/apache/kylin/pull/1207 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build Global Dict by MR/Hive New Version > > > Key: KYLIN-4342 > URL: https://issues.apache.org/jira/browse/KYLIN-4342 > Project: Kylin > Issue Type: Improvement >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > Fix For: v3.1.0 > > > At present, there are two limitations and some distributed concurrency lock > bugs in the implementation of global dictionary through MR/Hive: > 1. Limited by Hive order by global sorting on the shuffle stage, the memory > and build time becomes uncontrollable with data volume reaching billion > level. We have tested the base of 800 million level to configure 15g memory, > and the build time of build dictionary needs more than 10 hours; > 2. Multi global dictionary columns is calculated serially. > 3. Some distributed concurrency lock bugs. > We have improved the original version.The general idea of the new version is > the same as the previous Mr / Hive implementation, that is, to complete > global dictionary coding through Hive or MR, and then replace the original > value in the flat table with the dictionary encoded value.[Mr /Hive > V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]] > However, in the new version, will add "parallel part build" and "parallel > total build" two steps by mr to replace the original "build dict" step, so as > to solve the above two limitations.And use ZK to solve the distributed > concurrency lock bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120189#comment-17120189 ] ASF GitHub Bot commented on KYLIN-4342: --- hit-lacus commented on pull request #1207: URL: https://github.com/apache/kylin/pull/1207#issuecomment-636310944 Thank you @wangxiaojing123 , let's merge it into master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build Global Dict by MR/Hive New Version > > > Key: KYLIN-4342 > URL: https://issues.apache.org/jira/browse/KYLIN-4342 > Project: Kylin > Issue Type: Improvement >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > Fix For: v3.1.0 > > > At present, there are two limitations and some distributed concurrency lock > bugs in the implementation of global dictionary through MR/Hive: > 1. Limited by Hive order by global sorting on the shuffle stage, the memory > and build time becomes uncontrollable with data volume reaching billion > level. We have tested the base of 800 million level to configure 15g memory, > and the build time of build dictionary needs more than 10 hours; > 2. Multi global dictionary columns is calculated serially. > 3. Some distributed concurrency lock bugs. > We have improved the original version.The general idea of the new version is > the same as the previous Mr / Hive implementation, that is, to complete > global dictionary coding through Hive or MR, and then replace the original > value in the flat table with the dictionary encoded value.[Mr /Hive > V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]] > However, in the new version, will add "parallel part build" and "parallel > total build" two steps by mr to replace the original "build dict" step, so as > to solve the above two limitations.And use ZK to solve the distributed > concurrency lock bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120199#comment-17120199 ] ASF subversion and git services commented on KYLIN-4342: Commit a2489aaf4560adf7f415629519d6e4b617967dce in kylin's branch refs/heads/master from wangxiaojing [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=a2489aa ] KYLIN-4342 Build Global Dict by MR/Hive New Version, fix some potential bugs, such as null pointer exceptions > Build Global Dict by MR/Hive New Version > > > Key: KYLIN-4342 > URL: https://issues.apache.org/jira/browse/KYLIN-4342 > Project: Kylin > Issue Type: Improvement >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > Fix For: v3.1.0 > > > At present, there are two limitations and some distributed concurrency lock > bugs in the implementation of global dictionary through MR/Hive: > 1. Limited by Hive order by global sorting on the shuffle stage, the memory > and build time becomes uncontrollable with data volume reaching billion > level. We have tested the base of 800 million level to configure 15g memory, > and the build time of build dictionary needs more than 10 hours; > 2. Multi global dictionary columns is calculated serially. > 3. Some distributed concurrency lock bugs. > We have improved the original version.The general idea of the new version is > the same as the previous Mr / Hive implementation, that is, to complete > global dictionary coding through Hive or MR, and then replace the original > value in the flat table with the dictionary encoded value.[Mr /Hive > V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]] > However, in the new version, will add "parallel part build" and "parallel > total build" two steps by mr to replace the original "build dict" step, so as > to solve the above two limitations.And use ZK to solve the distributed > concurrency lock bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120200#comment-17120200 ] ASF subversion and git services commented on KYLIN-4342: Commit 5731f43fcf350247e76fd7e36b0980d5cf9fc912 in kylin's branch refs/heads/master from XiaoxiangYu [ https://gitbox.apache.org/repos/asf?p=kylin.git;h=5731f43 ] KYLIN-4342 Improve code smell > Build Global Dict by MR/Hive New Version > > > Key: KYLIN-4342 > URL: https://issues.apache.org/jira/browse/KYLIN-4342 > Project: Kylin > Issue Type: Improvement >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > Fix For: v3.1.0 > > > At present, there are two limitations and some distributed concurrency lock > bugs in the implementation of global dictionary through MR/Hive: > 1. Limited by Hive order by global sorting on the shuffle stage, the memory > and build time becomes uncontrollable with data volume reaching billion > level. We have tested the base of 800 million level to configure 15g memory, > and the build time of build dictionary needs more than 10 hours; > 2. Multi global dictionary columns is calculated serially. > 3. Some distributed concurrency lock bugs. > We have improved the original version.The general idea of the new version is > the same as the previous Mr / Hive implementation, that is, to complete > global dictionary coding through Hive or MR, and then replace the original > value in the flat table with the dictionary encoded value.[Mr /Hive > V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]] > However, in the new version, will add "parallel part build" and "parallel > total build" two steps by mr to replace the original "build dict" step, so as > to solve the above two limitations.And use ZK to solve the distributed > concurrency lock bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120191#comment-17120191 ] ASF GitHub Bot commented on KYLIN-4342: --- hit-lacus merged pull request #1207: URL: https://github.com/apache/kylin/pull/1207 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build Global Dict by MR/Hive New Version > > > Key: KYLIN-4342 > URL: https://issues.apache.org/jira/browse/KYLIN-4342 > Project: Kylin > Issue Type: Improvement >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > Fix For: v3.1.0 > > > At present, there are two limitations and some distributed concurrency lock > bugs in the implementation of global dictionary through MR/Hive: > 1. Limited by Hive order by global sorting on the shuffle stage, the memory > and build time becomes uncontrollable with data volume reaching billion > level. We have tested the base of 800 million level to configure 15g memory, > and the build time of build dictionary needs more than 10 hours; > 2. Multi global dictionary columns is calculated serially. > 3. Some distributed concurrency lock bugs. > We have improved the original version.The general idea of the new version is > the same as the previous Mr / Hive implementation, that is, to complete > global dictionary coding through Hive or MR, and then replace the original > value in the flat table with the dictionary encoded value.[Mr /Hive > V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]] > However, in the new version, will add "parallel part build" and "parallel > total build" two steps by mr to replace the original "build dict" step, so as > to solve the above two limitations.And use ZK to solve the distributed > concurrency lock bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119642#comment-17119642 ] ASF GitHub Bot commented on KYLIN-4342: --- codecov-commenter edited a comment on pull request #1207: URL: https://github.com/apache/kylin/pull/1207#issuecomment-635917409 # [Codecov](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=h1) Report > Merging [#1207](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=desc) into [master](https://codecov.io/gh/apache/kylin/commit/52edac1521b3b51d34e972b28df3d9dd462b394f=desc) will **increase** coverage by `0.00%`. > The diff coverage is `27.93%`. [![Impacted file tree graph](https://codecov.io/gh/apache/kylin/pull/1207/graphs/tree.svg?width=650=150=pr=JawVgbgsVo)](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=tree) ```diff @@Coverage Diff @@ ## master#1207+/- ## == Coverage 25.92% 25.92% - Complexity 6597 6623+26 == Files 1475 1476 +1 Lines 9016890394 +226 Branches 1258012622+42 == + Hits 2337523439+64 - Misses6452164662 +141 - Partials 2272 2293+21 ``` | [Impacted Files](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...pache/kylin/cache/cachemanager/CacheConstants.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y2FjaGUvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2NhY2hlL2NhY2hlbWFuYWdlci9DYWNoZUNvbnN0YW50cy5qYXZh) | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | | | [.../cachemanager/RemoteLocalFailOverCacheManager.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y2FjaGUvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2NhY2hlL2NhY2hlbWFuYWdlci9SZW1vdGVMb2NhbEZhaWxPdmVyQ2FjaGVNYW5hZ2VyLmphdmE=) | `78.57% <0.00%> (-6.05%)` | `7.00 <0.00> (ø)` | | | [...lin/rest/security/KylinAuthenticationProvider.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-c2VydmVyLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL3Jlc3Qvc2VjdXJpdHkvS3lsaW5BdXRoZW50aWNhdGlvblByb3ZpZGVyLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | | | [...rg/apache/kylin/rest/service/KylinUserService.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-c2VydmVyLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL3Jlc3Qvc2VydmljZS9LeWxpblVzZXJTZXJ2aWNlLmphdmE=) | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | | | [...t/spy/memcached/protocol/TCPMemcachedNodeImpl.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y2FjaGUvc3JjL21haW4vamF2YS9uZXQvc3B5L21lbWNhY2hlZC9wcm90b2NvbC9UQ1BNZW1jYWNoZWROb2RlSW1wbC5qYXZh) | `25.65% <25.65%> (ø)` | `23.00 <23.00> (?)` | | | [...ylin/cache/cachemanager/MemcachedCacheManager.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y2FjaGUvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2NhY2hlL2NhY2hlbWFuYWdlci9NZW1jYWNoZWRDYWNoZU1hbmFnZXIuamF2YQ==) | `46.03% <100.00%> (+2.69%)` | `7.00 <0.00> (ø)` | | | [...g/apache/kylin/cache/memcached/MemcachedCache.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y2FjaGUvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2NhY2hlL21lbWNhY2hlZC9NZW1jYWNoZWRDYWNoZS5qYXZh) | `49.10% <100.00%> (+1.89%)` | `23.00 <3.00> (+3.00)` | | | [...org/apache/kylin/rest/util/QueryRequestLimits.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-c2VydmVyLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL3Jlc3QvdXRpbC9RdWVyeVJlcXVlc3RMaW1pdHMuamF2YQ==) | `35.71% <0.00%> (-4.77%)` | `5.00% <0.00%> (-1.00%)` | | | [.../apache/kylin/cube/cuboid/TreeCuboidScheduler.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1jdWJlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9jdWJlL2N1Ym9pZC9UcmVlQ3Vib2lkU2NoZWR1bGVyLmphdmE=) | `63.84% <0.00%> (-2.31%)` | `0.00% <0.00%> (ø%)` | | | [...a/org/apache/kylin/dict/Number2BytesConverter.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1kaWN0aW9uYXJ5L3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9kaWN0L051bWJlcjJCeXRlc0NvbnZlcnRlci5qYXZh) | `81.74% <0.00%> (-0.80%)` | `17.00% <0.00%> (-1.00%)` | | | ... and [4 more](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by
[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119500#comment-17119500 ] ASF GitHub Bot commented on KYLIN-4342: --- lgtm-com[bot] commented on pull request #1207: URL: https://github.com/apache/kylin/pull/1207#issuecomment-635918529 This pull request **fixes 1 alert** when merging a3da3c9630f6c8c04a91fd416c5118ac892d6b16 into ead437ab41020aa47bddb067b566cdc874dfa286 - [view on LGTM.com](https://lgtm.com/projects/g/apache/kylin/rev/pr-20d691ee5af73eb79d3dadde5deb498d04e01a03) **fixed alerts:** * 1 for Boxed variable is never null This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build Global Dict by MR/Hive New Version > > > Key: KYLIN-4342 > URL: https://issues.apache.org/jira/browse/KYLIN-4342 > Project: Kylin > Issue Type: Improvement >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > Fix For: v3.1.0 > > > At present, there are two limitations and some distributed concurrency lock > bugs in the implementation of global dictionary through MR/Hive: > 1. Limited by Hive order by global sorting on the shuffle stage, the memory > and build time becomes uncontrollable with data volume reaching billion > level. We have tested the base of 800 million level to configure 15g memory, > and the build time of build dictionary needs more than 10 hours; > 2. Multi global dictionary columns is calculated serially. > 3. Some distributed concurrency lock bugs. > We have improved the original version.The general idea of the new version is > the same as the previous Mr / Hive implementation, that is, to complete > global dictionary coding through Hive or MR, and then replace the original > value in the flat table with the dictionary encoded value.[Mr /Hive > V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]] > However, in the new version, will add "parallel part build" and "parallel > total build" two steps by mr to replace the original "build dict" step, so as > to solve the above two limitations.And use ZK to solve the distributed > concurrency lock bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119498#comment-17119498 ] ASF GitHub Bot commented on KYLIN-4342: --- codecov-commenter commented on pull request #1207: URL: https://github.com/apache/kylin/pull/1207#issuecomment-635917409 # [Codecov](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=h1) Report > Merging [#1207](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=desc) into [master](https://codecov.io/gh/apache/kylin/commit/52edac1521b3b51d34e972b28df3d9dd462b394f=desc) will **decrease** coverage by `0.15%`. > The diff coverage is `0.73%`. [![Impacted file tree graph](https://codecov.io/gh/apache/kylin/pull/1207/graphs/tree.svg?width=650=150=pr=JawVgbgsVo)](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#1207 +/- ## - Coverage 25.92% 25.77% -0.16% - Complexity 6597 6622 +25 Files 1475 1482 +7 Lines 9016890949 +781 Branches 1258012687 +107 + Hits 2337523441 +66 - Misses6452165215 +694 - Partials 2272 2293 +21 ``` | [Impacted Files](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...n/job/lock/zookeeper/ZookeeperDistributedLock.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1qb2Ivc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2t5bGluL2pvYi9sb2NrL3pvb2tlZXBlci9ab29rZWVwZXJEaXN0cmlidXRlZExvY2suamF2YQ==) | `49.35% <0.00%> (ø)` | `18.00 <0.00> (ø)` | | | [...in/measure/bitmap/BitmapIntersectValueAggFunc.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iaXRtYXAvQml0bWFwSW50ZXJzZWN0VmFsdWVBZ2dGdW5jLmphdmE=) | `0.00% <ø> (ø)` | `0.00 <0.00> (ø)` | | | [...apache/kylin/measure/bitmap/BitmapMeasureType.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWVhc3VyZS9iaXRtYXAvQml0bWFwTWVhc3VyZVR5cGUuamF2YQ==) | `18.51% <0.00%> (ø)` | `4.00 <0.00> (ø)` | | | [.../org/apache/kylin/metadata/model/FunctionDesc.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-Y29yZS1tZXRhZGF0YS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3lsaW4vbWV0YWRhdGEvbW9kZWwvRnVuY3Rpb25EZXNjLmphdmE=) | `23.37% <ø> (ø)` | `18.00 <0.00> (ø)` | | | [...apache/kylin/engine/mr/BatchCubingJobBuilder2.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvQmF0Y2hDdWJpbmdKb2JCdWlsZGVyMi5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | | | [.../org/apache/kylin/engine/mr/JobBuilderSupport.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvSm9iQnVpbGRlclN1cHBvcnQuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | | | [...ache/kylin/engine/mr/common/BaseCuboidBuilder.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvY29tbW9uL0Jhc2VDdWJvaWRCdWlsZGVyLmphdmE=) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | | | [...gine/mr/steps/BuildGlobalHiveDictPartBuildJob.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvc3RlcHMvQnVpbGRHbG9iYWxIaXZlRGljdFBhcnRCdWlsZEpvYi5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | | | [...e/mr/steps/BuildGlobalHiveDictPartBuildMapper.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvc3RlcHMvQnVpbGRHbG9iYWxIaXZlRGljdFBhcnRCdWlsZE1hcHBlci5qYXZh) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | | | [.../mr/steps/BuildGlobalHiveDictPartBuildReducer.java](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree#diff-ZW5naW5lLW1yL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9reWxpbi9lbmdpbmUvbXIvc3RlcHMvQnVpbGRHbG9iYWxIaXZlRGljdFBhcnRCdWlsZFJlZHVjZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (?)` | | | ... and [26 more](https://codecov.io/gh/apache/kylin/pull/1207/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/kylin/pull/1207?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` >
[jira] [Commented] (KYLIN-4342) Build Global Dict by MR/Hive New Version
[ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018765#comment-17018765 ] Xiaoxiang Yu commented on KYLIN-4342: - Great feature for apache kylin! > Build Global Dict by MR/Hive New Version > > > Key: KYLIN-4342 > URL: https://issues.apache.org/jira/browse/KYLIN-4342 > Project: Kylin > Issue Type: Improvement >Affects Versions: Future >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > > At present, there are two limitations and some distributed concurrency lock > bugs in the implementation of global dictionary through MR/Hive: > 1. Limited by Hive order by global sorting on the shuffle stage, the memory > and build time becomes uncontrollable with data volume reaching billion > level. We have tested the base of 800 million level to configure 15g memory, > and the build time of build dictionary needs more than 10 hours; > 2. Multi global dictionary columns is calculated serially. > 3. Some distributed concurrency lock bugs. > We have improved the original version.The general idea of the new version is > the same as the previous Mr / Hive implementation, that is, to complete > global dictionary coding through Hive or MR, and then replace the original > value in the flat table with the dictionary encoded value.[Mr /Hive > V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]] > However, in the new version, will add "parallel part build" and "parallel > total build" two steps by mr to replace the original "build dict" step, so as > to solve the above two limitations.And use ZK to solve the distributed > concurrency lock bugs. -- This message was sent by Atlassian Jira (v8.3.4#803005)