[jira] [Created] (KYLIN-2406) TPC-H query 20, can triggers NPE
liyang created KYLIN-2406: - Summary: TPC-H query 20, can triggers NPE Key: KYLIN-2406 URL: https://issues.apache.org/jira/browse/KYLIN-2406 Project: Kylin Issue Type: Bug Reporter: liyang Below query triggers NPE {code} with tmp3 as ( select l_partkey, 0.5 * sum(l_quantity) as sum_quantity, l_suppkey from v_lineitem inner join supplier on l_suppkey = s_suppkey inner join nation on s_nationkey = n_nationkey inner join part on l_partkey = p_partkey where l_shipdate >= '1992-01-01' and l_shipdate <= '1995-01-01' and n_name = 'CANADA' and p_name like 'forest%' group by l_partkey, l_suppkey ) select s_name, s_address from v_partsupp inner join tmp3 on ps_partkey = l_partkey and ps_suppkey = l_suppkey inner join supplier on ps_suppkey = s_suppkey where ps_availqty > sum_quantity group by s_name, s_address order by s_name {code} While below query is OK. Only difference being the order of "inner join tmp3" and "inner join supplier" {code} with tmp3 as ( select l_partkey, 0.5 * sum(l_quantity) as sum_quantity, l_suppkey from v_lineitem inner join supplier on l_suppkey = s_suppkey inner join nation on s_nationkey = n_nationkey inner join part on l_partkey = p_partkey where l_shipdate >= '1992-01-01' and l_shipdate <= '1995-01-01' and n_name = 'CANADA' and p_name like 'forest%' group by l_partkey, l_suppkey ) select s_name, s_address from v_partsupp inner join supplier on ps_suppkey = s_suppkey inner join tmp3 on ps_partkey = l_partkey and ps_suppkey = l_suppkey where ps_availqty > sum_quantity group by s_name, s_address order by s_name {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2405) No cube detail info loaded.
readme_kylin created KYLIN-2405: --- Summary: No cube detail info loaded. Key: KYLIN-2405 URL: https://issues.apache.org/jira/browse/KYLIN-2405 Project: Kylin Issue Type: Bug Affects Versions: v1.6.0 Environment: hadoop 2.6.4 hive 2.1.0 hbase 1.2.4 Reporter: readme_kylin I deploy two kylin servers.the query mode server ,always throw oops,"no cube detail info loaded" here is the kylin.log. 2017-01-18 14:41:13,514 INFO [http-bio-7070-exec-2324] project.ProjectL2Cache:174 : Loading L2 project cache for 4399_ADGAME 2017-01-18 14:41:13,514 WARN [http-bio-7070-exec-2324] realization.RealizationRegistry:120 : No provider for realization type INVERTED_INDEX 2017-01-18 14:41:13,515 WARN [http-bio-7070-exec-2324] realization.RealizationRegistry:120 : No provider for realization type INVERTED_INDEX 2017-01-18 14:41:13,515 WARN [http-bio-7070-exec-2324] realization.RealizationRegistry:120 : No provider for realization type INVERTED_INDEX 2017-01-18 14:41:13,515 WARN [http-bio-7070-exec-2324] realization.RealizationRegistry:120 : No provider for realization type INVERTED_INDEX 2017-01-18 14:41:13,516 WARN [http-bio-7070-exec-2324] realization.RealizationRegistry:120 : No provider for realization type INVERTED_INDEX 2017-01-18 14:41:13,516 WARN [http-bio-7070-exec-2324] realization.RealizationRegistry:120 : No provider for realization type INVERTED_INDEX 2017-01-18 14:41:13,516 ERROR [http-bio-7070-exec-2324] controller.TableController:100 : Failed to deal with the request. java.lang.NullPointerException at org.apache.kylin.cube.CubeInstance.getAllColumns(CubeInstance.java:393) at org.apache.kylin.metadata.project.ProjectL2Cache.sanityCheck(ProjectL2Cache.java:236) at org.apache.kylin.metadata.project.ProjectL2Cache.loadCache(ProjectL2Cache.java:220) at org.apache.kylin.metadata.project.ProjectL2Cache.getCache(ProjectL2Cache.java:167) at org.apache.kylin.metadata.project.ProjectL2Cache.listDefinedTables(ProjectL2Cache.java:75) at org.apache.kylin.metadata.project.ProjectManager.listDefinedTables(ProjectManager.java:405) at org.apache.kylin.rest.controller.TableController.getHiveTables(TableController.java:98) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:1 04) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java: 743) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:672) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:82) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:933) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:842) at javax.servlet.http.HttpServlet.service(HttpServlet.java:624) at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827) at javax.servlet.http.HttpServlet.service(HttpServlet.java:731) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84) at
[jira] [Updated] (KYLIN-2404) Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml
[ https://issues.apache.org/jira/browse/KYLIN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaofeng SHI updated KYLIN-2404: Description: Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to proper size after creating the intermediate hive table. While in some users' environment, hive merge small files is enabled by default, that will cause additional CPU and will impact on the cube building performance (in extreme case the files will be merged to 256MB, then only very small number of mappers be started in building). So Kylin should explicitly tell Hive to disable the merge small files feature when creating and redistributing the intermediate flat table. Adding "hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml with value "false" will solve this. The meaning of these two parameters can be found in https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration was: Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to proper size after creating the intermediate hive table. While in some users' environment, hive merge small files is enabled by default, that will cause additional CPU and will impact on the cube building performance (in extreme case the files will be merged to 256MB, then only very small number of mappers be started in building). So Kylin should explicitly tell Hive to disable the merge small files feature when creating and redistributing the intermediate flat table. Will add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml with value "false". The meaning of these two parameters can be found in https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration > Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml > - > > Key: KYLIN-2404 > URL: https://issues.apache.org/jira/browse/KYLIN-2404 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Shaofeng SHI >Assignee: Shaofeng SHI >Priority: Minor > Fix For: v2.0.0 > > > Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to > proper size after creating the intermediate hive table. While in some users' > environment, hive merge small files is enabled by default, that will cause > additional CPU and will impact on the cube building performance (in extreme > case the files will be merged to 256MB, then only very small number of > mappers be started in building). > So Kylin should explicitly tell Hive to disable the merge small files feature > when creating and redistributing the intermediate flat table. Adding > "hive.merge.mapfiles" and "hive.merge.mapredfiles" to > conf/kylin_hive_conf.xml with value "false" will solve this. The meaning of > these two parameters can be found in > https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-2404) Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml
[ https://issues.apache.org/jira/browse/KYLIN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaofeng SHI updated KYLIN-2404: Description: Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to proper size after creating the intermediate hive table. While in some users' environment, hive merge small files is enabled by default, that will cause additional CPU and will impact on the cube building performance (in extreme case the files will be merged to 256MB, then only very small number of mappers be started in building). So Kylin should explicitly tell Hive to disable the merge small files feature when creating and redistributing the intermediate flat table. Will add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml with value "false". The meaning of these two parameters can be found in https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration was: Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to proper size after creating the intermediate hive table. While in some users' environment, hive merge small files is enabled by default, that will cause additional CPU and will impact on the cube building performance (in extreme case the files will be merged to 256MB, then only very small number of mappers be started in building). So Kylin should explicitly tell Hive to disable the merge small files feature when creating and redistributing the intermediate flat table. Will add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml with value "false". > Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml > - > > Key: KYLIN-2404 > URL: https://issues.apache.org/jira/browse/KYLIN-2404 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Shaofeng SHI >Assignee: Shaofeng SHI >Priority: Minor > Fix For: v2.0.0 > > > Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to > proper size after creating the intermediate hive table. While in some users' > environment, hive merge small files is enabled by default, that will cause > additional CPU and will impact on the cube building performance (in extreme > case the files will be merged to 256MB, then only very small number of > mappers be started in building). > So Kylin should explicitly tell Hive to disable the merge small files feature > when creating and redistributing the intermediate flat table. Will add > "hive.merge.mapfiles" and "hive.merge.mapredfiles" to > conf/kylin_hive_conf.xml with value "false". The meaning of these two > parameters can be found in > https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2404) Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml
Shaofeng SHI created KYLIN-2404: --- Summary: Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml Key: KYLIN-2404 URL: https://issues.apache.org/jira/browse/KYLIN-2404 Project: Kylin Issue Type: Improvement Components: Job Engine Reporter: Shaofeng SHI Assignee: Shaofeng SHI Priority: Minor Fix For: v2.0.0 Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to proper size after creating the intermediate hive table. While in some users' environment, hive merge small files is enabled by default, that will cause additional CPU and will impact on the cube building performance (in extreme case the files will be merged to 256MB, then only very small number of mappers be started in building). So Kylin should explicitly tell Hive to disable the merge small files feature when creating and redistributing the intermediate flat table. Will add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml with value "false". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2403) tableau extract month in where
Pavel Tarasov created KYLIN-2403: Summary: tableau extract month in where Key: KYLIN-2403 URL: https://issues.apache.org/jira/browse/KYLIN-2403 Project: Kylin Issue Type: Bug Components: 3rd Party Affects Versions: v1.5.4.1 Reporter: Pavel Tarasov I’ve problem with tableau & kylin connect. When creating filter on month in tableau it generate query with filter WHERE (({fn EXTRACT(MONTH FROM "TEST_ORDERFACT"."DADD")} - 1) / 3 + 1 = 2). Detailed query example from tableau: SELECT "AMOCRM_MANAGERS"."NAME" AS "NAME__AMOCRM_MANAGERS_", {fn EXTRACT(MONTH FROM "TEST_ORDERFACT"."DADD")} AS "mn_DADD_ok", ({fn EXTRACT(MONTH FROM "TEST_ORDERFACT"."DADD")} - 1) / 3 + 1 AS "qr_DADD_ok", SUM("TEST_ORDERFACT"."AMOUNT") AS "sum_AMOUNT_ok", {fn EXTRACT(YEAR FROM "TEST_ORDERFACT"."DADD")} AS "yr_DADD_ok" FROM "PTARASOV"."TEST_ORDERFACT" "TEST_ORDERFACT" INNER JOIN "REALTYANALYTICS"."CLIENTCATEGORIES" "CLIENTCATEGORIES" ON ("TEST_ORDERFACT"."CLIENTCATEGORY" = "CLIENTCATEGORIES"."ID") INNER JOIN "REALTYANALYTICS"."AMOCRM_MANAGERS" "AMOCRM_MANAGERS" ON ("TEST_ORDERFACT"."MANAGER" = "AMOCRM_MANAGERS"."ID") LEFT JOIN "REALTYANALYTICS"."LOCATIONS" "LOCATIONS" ON ("TEST_ORDERFACT"."REGION" = "LOCATIONS"."ID") LEFT JOIN "REALTYANALYTICS"."ORDERFACTSERVICEPACKAGESOURCETYPES" "ORDERFACTSERVICEPACKAGESOURCETYPES" ON ("TEST_ORDERFACT"."ORDERFACTSERVICEPACKAGESOURCETYPEID" = "ORDERFACTSERVICEPACKAGESOURCETYPES"."ID") INNER JOIN "REALTYANALYTICS"."PRODUCTCATEGORIES" "PRODUCTCATEGORIES" ON ("TEST_ORDERFACT"."TARIF" = "PRODUCTCATEGORIES"."ID") INNER JOIN "REALTYANALYTICS"."PRODUCTS" "PRODUCTS" ON ("TEST_ORDERFACT"."PRODUCT" = "PRODUCTS"."ID") WHERE (("AMOCRM_MANAGERS"."NAME" = 'Саркис Ирицян') AND ("TEST_ORDERFACT"."USERSITEID" = 3032446) AND (({fn EXTRACT(MONTH FROM "TEST_ORDERFACT"."DADD")} - 1) / 3 + 1 = 3)) GROUP BY "AMOCRM_MANAGERS"."NAME", {fn EXTRACT(MONTH FROM "TEST_ORDERFACT"."DADD")}, ({fn EXTRACT(MONTH FROM "TEST_ORDERFACT"."DADD")} - 1) / 3 + 1, {fn EXTRACT(YEAR FROM "TEST_ORDERFACT"."DADD")} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance
[ https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825814#comment-15825814 ] Dayue Gao commented on KYLIN-2387: -- Commit https://github.com/apache/kylin/commit/e894465007f422d619ddeab2acd87e38fa093fd9 removes the usage of ImmutableRoaringBitmap.bitmapOf. > A new BitmapCounter with better performance > --- > > Key: KYLIN-2387 > URL: https://issues.apache.org/jira/browse/KYLIN-2387 > Project: Kylin > Issue Type: Improvement > Components: Metadata, Query Engine, Storage - HBase >Affects Versions: v2.0.0 >Reporter: Dayue Gao >Assignee: Dayue Gao > > We found the old BitmapCounter does not perform very well on very large > bitmap. The inefficiency comes from > * Poor serialize implementation: instead of serialize bitmap directly to > ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes > superfluous memory allocations > * Poor peekLength implementation: the whole bitmap is deserialized in order > to retrieve its serialized size > * Extra deserialize cost: even if only cardinality info is needed to answer > query, the whole bitmap is deserialize into MutableRoaringBitmap > A new BitmapCounter is designed to solve these problems > * It comes in tow flavors, mutable and immutable, which is based on > Mutable/Immutable RoaringBitmap correspondingly > * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a > copied buffer. So we always deserialize to ImmutableBitmapCounter at first, > and convert it to MutableBitmapCounter only when necessary > * peekLength is implemented using > ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only > the header of roaring format is examined > * It can directly serializes to ByteBuffer, no intermediate buffer is > allocated > * The wire format is the same as before > ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). > Therefore no cube rebuild is needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-2370) Refine unload table
[ https://issues.apache.org/jira/browse/KYLIN-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhixiong Chen updated KYLIN-2370: - Attachment: 0001-KYLIN-2370-Refine-unload-and-reload-table.patch I have completed this issue.Please check it.Thanks! > Refine unload table > --- > > Key: KYLIN-2370 > URL: https://issues.apache.org/jira/browse/KYLIN-2370 > Project: Kylin > Issue Type: Improvement > Components: Web >Reporter: Zhixiong Chen >Assignee: Zhixiong Chen > Attachments: 0001-KYLIN-2370-Refine-unload-and-reload-table.patch > > > Move the Unload Button to the Table level . > Add a reload button side by side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance
[ https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825776#comment-15825776 ] Shaofeng SHI commented on KYLIN-2387: - If it is not easy to keep compitable with old version, we can shade the old version when building binary package. > A new BitmapCounter with better performance > --- > > Key: KYLIN-2387 > URL: https://issues.apache.org/jira/browse/KYLIN-2387 > Project: Kylin > Issue Type: Improvement > Components: Metadata, Query Engine, Storage - HBase >Affects Versions: v2.0.0 >Reporter: Dayue Gao >Assignee: Dayue Gao > > We found the old BitmapCounter does not perform very well on very large > bitmap. The inefficiency comes from > * Poor serialize implementation: instead of serialize bitmap directly to > ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes > superfluous memory allocations > * Poor peekLength implementation: the whole bitmap is deserialized in order > to retrieve its serialized size > * Extra deserialize cost: even if only cardinality info is needed to answer > query, the whole bitmap is deserialize into MutableRoaringBitmap > A new BitmapCounter is designed to solve these problems > * It comes in tow flavors, mutable and immutable, which is based on > Mutable/Immutable RoaringBitmap correspondingly > * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a > copied buffer. So we always deserialize to ImmutableBitmapCounter at first, > and convert it to MutableBitmapCounter only when necessary > * peekLength is implemented using > ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only > the header of roaring format is examined > * It can directly serializes to ByteBuffer, no intermediate buffer is > allocated > * The wire format is the same as before > ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). > Therefore no cube rebuild is needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance
[ https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825749#comment-15825749 ] Dayue Gao commented on KYLIN-2387: -- OK, I'll remove the usage of ImmutableRoaringBitmap.bitmapOf. But I'm not sure if there are any other incompatible methods. > A new BitmapCounter with better performance > --- > > Key: KYLIN-2387 > URL: https://issues.apache.org/jira/browse/KYLIN-2387 > Project: Kylin > Issue Type: Improvement > Components: Metadata, Query Engine, Storage - HBase >Affects Versions: v2.0.0 >Reporter: Dayue Gao >Assignee: Dayue Gao > > We found the old BitmapCounter does not perform very well on very large > bitmap. The inefficiency comes from > * Poor serialize implementation: instead of serialize bitmap directly to > ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes > superfluous memory allocations > * Poor peekLength implementation: the whole bitmap is deserialized in order > to retrieve its serialized size > * Extra deserialize cost: even if only cardinality info is needed to answer > query, the whole bitmap is deserialize into MutableRoaringBitmap > A new BitmapCounter is designed to solve these problems > * It comes in tow flavors, mutable and immutable, which is based on > Mutable/Immutable RoaringBitmap correspondingly > * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a > copied buffer. So we always deserialize to ImmutableBitmapCounter at first, > and convert it to MutableBitmapCounter only when necessary > * peekLength is implemented using > ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only > the header of roaring format is examined > * It can directly serializes to ByteBuffer, no intermediate buffer is > allocated > * The wire format is the same as before > ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). > Therefore no cube rebuild is needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2402) execute query hits No registered coprocessor service
answer created KYLIN-2402: - Summary: execute query hits No registered coprocessor service Key: KYLIN-2402 URL: https://issues.apache.org/jira/browse/KYLIN-2402 Project: Kylin Issue Type: Improvement Environment: hadoop2.6.4 /hbase1.1.5/hive1.2.1/zk3.4.6/kylin1.6.0 Reporter: answer today I add kylin.coprocessor.local.jar in kylin.property and when build cube done . I execute query "select * from xxx" hits error below: org.apache.hadoop.hbase.exceptions.UnknownProtocolException: No registered coprocessor service found for name CubeVisitService in region KYLIN_GA464RXYQX,,1484635706236.9aac092cc3bd8855999c7a8734f09842. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance
[ https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825722#comment-15825722 ] Shaofeng SHI commented on KYLIN-2387: - I'm working on spark cubing (KYLIN-2331) recently, these classes will be serailized by Kryo, then trigger the initialization... > A new BitmapCounter with better performance > --- > > Key: KYLIN-2387 > URL: https://issues.apache.org/jira/browse/KYLIN-2387 > Project: Kylin > Issue Type: Improvement > Components: Metadata, Query Engine, Storage - HBase >Affects Versions: v2.0.0 >Reporter: Dayue Gao >Assignee: Dayue Gao > > We found the old BitmapCounter does not perform very well on very large > bitmap. The inefficiency comes from > * Poor serialize implementation: instead of serialize bitmap directly to > ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes > superfluous memory allocations > * Poor peekLength implementation: the whole bitmap is deserialized in order > to retrieve its serialized size > * Extra deserialize cost: even if only cardinality info is needed to answer > query, the whole bitmap is deserialize into MutableRoaringBitmap > A new BitmapCounter is designed to solve these problems > * It comes in tow flavors, mutable and immutable, which is based on > Mutable/Immutable RoaringBitmap correspondingly > * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a > copied buffer. So we always deserialize to ImmutableBitmapCounter at first, > and convert it to MutableBitmapCounter only when necessary > * peekLength is implemented using > ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only > the header of roaring format is examined > * It can directly serializes to ByteBuffer, no intermediate buffer is > allocated > * The wire format is the same as before > ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). > Therefore no cube rebuild is needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance
[ https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825703#comment-15825703 ] Dayue Gao commented on KYLIN-2387: -- ImmutableRoaringBitmap.bitmapOf is only used in test, so it's possible to remove the usage of it. But my question is, why does kylin load RoaringBitmap class from spark? Is it a classpath issue? > A new BitmapCounter with better performance > --- > > Key: KYLIN-2387 > URL: https://issues.apache.org/jira/browse/KYLIN-2387 > Project: Kylin > Issue Type: Improvement > Components: Metadata, Query Engine, Storage - HBase >Affects Versions: v2.0.0 >Reporter: Dayue Gao >Assignee: Dayue Gao > > We found the old BitmapCounter does not perform very well on very large > bitmap. The inefficiency comes from > * Poor serialize implementation: instead of serialize bitmap directly to > ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes > superfluous memory allocations > * Poor peekLength implementation: the whole bitmap is deserialized in order > to retrieve its serialized size > * Extra deserialize cost: even if only cardinality info is needed to answer > query, the whole bitmap is deserialize into MutableRoaringBitmap > A new BitmapCounter is designed to solve these problems > * It comes in tow flavors, mutable and immutable, which is based on > Mutable/Immutable RoaringBitmap correspondingly > * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a > copied buffer. So we always deserialize to ImmutableBitmapCounter at first, > and convert it to MutableBitmapCounter only when necessary > * peekLength is implemented using > ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only > the header of roaring format is examined > * It can directly serializes to ByteBuffer, no intermediate buffer is > allocated > * The wire format is the same as before > ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). > Therefore no cube rebuild is needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance
[ https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825684#comment-15825684 ] Shaofeng SHI commented on KYLIN-2387: - With a build from latest master branch, I got the following error on CDH 5.8: {code} Exception in thread "main" java.lang.NoSuchMethodError: org.roaringbitmap.buffer.ImmutableRoaringBitmap.bitmapOf([I)Lorg/roaringbitmap/buffer/ImmutableRoaringBitmap; at org.apache.kylin.measure.bitmap.ImmutableBitmapCounter.(ImmutableBitmapCounter.java:38) at org.apache.kylin.measure.bitmap.BitmapSerializer.(BitmapSerializer.java:28) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:278) {code} It works before, so I think the error is related with this change. I also searched on the machine, but found CDH 5.8 still use an old roaringbitmap: find / -name RoaringBitmap*.jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/jars/RoaringBitmap-0.5.11.jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/oozie/oozie-sharelib-mr1/lib/spark/RoaringBitmap-0.5.11.jar /opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/oozie/oozie-sharelib-yarn/lib/spark/RoaringBitmap-0.5.11.jar [~dayue] can we use a method that also exists in old version? Thanks. > A new BitmapCounter with better performance > --- > > Key: KYLIN-2387 > URL: https://issues.apache.org/jira/browse/KYLIN-2387 > Project: Kylin > Issue Type: Improvement > Components: Metadata, Query Engine, Storage - HBase >Affects Versions: v2.0.0 >Reporter: Dayue Gao >Assignee: Dayue Gao > > We found the old BitmapCounter does not perform very well on very large > bitmap. The inefficiency comes from > * Poor serialize implementation: instead of serialize bitmap directly to > ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes > superfluous memory allocations > * Poor peekLength implementation: the whole bitmap is deserialized in order > to retrieve its serialized size > * Extra deserialize cost: even if only cardinality info is needed to answer > query, the whole bitmap is deserialize into MutableRoaringBitmap > A new BitmapCounter is designed to solve these problems > * It comes in tow flavors, mutable and immutable, which is based on > Mutable/Immutable RoaringBitmap correspondingly > * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a > copied buffer. So we always deserialize to ImmutableBitmapCounter at first, > and convert it to MutableBitmapCounter only when necessary > * peekLength is implemented using > ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only > the header of roaring format is examined > * It can directly serializes to ByteBuffer, no intermediate buffer is > allocated > * The wire format is the same as before > ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). > Therefore no cube rebuild is needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-2313) Cannot find a cube in a subquery case with count distinct
[ https://issues.apache.org/jira/browse/KYLIN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyang resolved KYLIN-2313. --- Resolution: Cannot Reproduce > Cannot find a cube in a subquery case with count distinct > - > > Key: KYLIN-2313 > URL: https://issues.apache.org/jira/browse/KYLIN-2313 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v1.6.0 >Reporter: Dong Li >Assignee: liyang >Priority: Minor > Fix For: v2.0.0 > > > With sample cube, > The first query can find a cube and give correct result: > select p.part_dt, p.lstg_site_id, p.grp, count(distinct user_id), sum(price) > from ( > select t.part_dt, t.lstg_site_id, t.user_id, t.price, > case t.lstg_format_name when 'ABIN' then 'AAA' when 'BBIN' then 'BBB' else > 'CCC' end as grp from kylin_sales t) p > group by p.part_dt, p.lstg_site_id, p.grp > The second query will throw exception: cannot find any realization: > select p.part_dt, p.lstg_site_id, p.grp, count(distinct user_id), sum(price) > from ( > select t.part_dt, t.lstg_site_id, t.user_id, t.price, > case t.lstg_format_name when 'ABIN' then 'AAA' > when 'BBIN' then 'BBB' > else 'CCC' end as grp > from kylin_sales t > ) p > where p.part_dt=DATE'2013-01-01' > group by p.part_dt, p.lstg_site_id, p.grp > Error message: > Error while executing SQL "select p.part_dt, p.lstg_site_id, p.grp, > count(distinct user_id), sum(price) from ( select t.part_dt, t.lstg_site_id, > t.user_id, t.price, case t.lstg_format_name when 'ABIN' then 'AAA' when > 'BBIN' then 'BBB' else 'CCC' end as grp from kylin_sales t ) p where > p.part_dt='2013-01-01' group by p.part_dt, p.lstg_site_id, p.grp LIMIT > 5": Can't find any realization. Please confirm with providers. SQL > digest: fact table DEFAULT.KYLIN_SALES,group by [DEFAULT.KYLIN_SALES.PART_DT, > DEFAULT.KYLIN_SALES.LSTG_SITE_ID, > UNKNOWN_MODEL:DEFAULT._KYLIN_TABLE.GRP],filter on > [DEFAULT.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc > [expression=COUNT_DISTINCT, parameter=ParameterDesc [type=column, > value=USER_ID, nextParam=null], returnType=null], FunctionDesc > [expression=SUM, parameter=ParameterDesc [type=column, value=PRICE, > nextParam=null], returnType=null]]. > The difference between these 2 queries is: there's one condition in the > second query: where p.part_dt=DATE'2013-01-01' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (KYLIN-2313) Cannot find a cube in a subquery case with count distinct
[ https://issues.apache.org/jira/browse/KYLIN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyang reopened KYLIN-2313: --- > Cannot find a cube in a subquery case with count distinct > - > > Key: KYLIN-2313 > URL: https://issues.apache.org/jira/browse/KYLIN-2313 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v1.6.0 >Reporter: Dong Li >Assignee: liyang >Priority: Minor > Fix For: v2.0.0 > > > With sample cube, > The first query can find a cube and give correct result: > select p.part_dt, p.lstg_site_id, p.grp, count(distinct user_id), sum(price) > from ( > select t.part_dt, t.lstg_site_id, t.user_id, t.price, > case t.lstg_format_name when 'ABIN' then 'AAA' when 'BBIN' then 'BBB' else > 'CCC' end as grp from kylin_sales t) p > group by p.part_dt, p.lstg_site_id, p.grp > The second query will throw exception: cannot find any realization: > select p.part_dt, p.lstg_site_id, p.grp, count(distinct user_id), sum(price) > from ( > select t.part_dt, t.lstg_site_id, t.user_id, t.price, > case t.lstg_format_name when 'ABIN' then 'AAA' > when 'BBIN' then 'BBB' > else 'CCC' end as grp > from kylin_sales t > ) p > where p.part_dt=DATE'2013-01-01' > group by p.part_dt, p.lstg_site_id, p.grp > Error message: > Error while executing SQL "select p.part_dt, p.lstg_site_id, p.grp, > count(distinct user_id), sum(price) from ( select t.part_dt, t.lstg_site_id, > t.user_id, t.price, case t.lstg_format_name when 'ABIN' then 'AAA' when > 'BBIN' then 'BBB' else 'CCC' end as grp from kylin_sales t ) p where > p.part_dt='2013-01-01' group by p.part_dt, p.lstg_site_id, p.grp LIMIT > 5": Can't find any realization. Please confirm with providers. SQL > digest: fact table DEFAULT.KYLIN_SALES,group by [DEFAULT.KYLIN_SALES.PART_DT, > DEFAULT.KYLIN_SALES.LSTG_SITE_ID, > UNKNOWN_MODEL:DEFAULT._KYLIN_TABLE.GRP],filter on > [DEFAULT.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc > [expression=COUNT_DISTINCT, parameter=ParameterDesc [type=column, > value=USER_ID, nextParam=null], returnType=null], FunctionDesc > [expression=SUM, parameter=ParameterDesc [type=column, value=PRICE, > nextParam=null], returnType=null]]. > The difference between these 2 queries is: there's one condition in the > second query: where p.part_dt=DATE'2013-01-01' -- This message was sent by Atlassian JIRA (v6.3.4#6332)