[jira] [Created] (KYLIN-2406) TPC-H query 20, can triggers NPE

2017-01-17 Thread liyang (JIRA)
liyang created KYLIN-2406:
-

 Summary: TPC-H query 20, can triggers NPE
 Key: KYLIN-2406
 URL: https://issues.apache.org/jira/browse/KYLIN-2406
 Project: Kylin
  Issue Type: Bug
Reporter: liyang


Below query triggers NPE

{code}
with tmp3 as (
select l_partkey, 0.5 * sum(l_quantity) as sum_quantity, l_suppkey
from v_lineitem
inner join supplier on l_suppkey = s_suppkey
inner join nation on s_nationkey = n_nationkey
inner join part on l_partkey = p_partkey
where l_shipdate >= '1992-01-01' and l_shipdate <= '1995-01-01'
and n_name = 'CANADA'
and p_name like 'forest%'
group by l_partkey, l_suppkey
)

select
s_name,
s_address
from
v_partsupp
inner join tmp3 on ps_partkey = l_partkey and ps_suppkey = l_suppkey
inner join supplier on ps_suppkey = s_suppkey
where
ps_availqty > sum_quantity
group by
s_name, s_address
order by
s_name
{code}

While below query is OK. Only difference being the order of "inner join tmp3" 
and "inner join supplier"

{code}
with tmp3 as (
select l_partkey, 0.5 * sum(l_quantity) as sum_quantity, l_suppkey
from v_lineitem
inner join supplier on l_suppkey = s_suppkey
inner join nation on s_nationkey = n_nationkey
inner join part on l_partkey = p_partkey
where l_shipdate >= '1992-01-01' and l_shipdate <= '1995-01-01'
and n_name = 'CANADA'
and p_name like 'forest%'
group by l_partkey, l_suppkey
)

select
s_name,
s_address
from
v_partsupp
inner join supplier on ps_suppkey = s_suppkey
inner join tmp3 on ps_partkey = l_partkey and ps_suppkey = l_suppkey
where
ps_availqty > sum_quantity
group by
s_name, s_address
order by
s_name
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2405) No cube detail info loaded.

2017-01-17 Thread readme_kylin (JIRA)
readme_kylin created KYLIN-2405:
---

 Summary: No cube detail info loaded.
 Key: KYLIN-2405
 URL: https://issues.apache.org/jira/browse/KYLIN-2405
 Project: Kylin
  Issue Type: Bug
Affects Versions: v1.6.0
 Environment: hadoop 2.6.4
hive 2.1.0
hbase 1.2.4
Reporter: readme_kylin


I deploy two kylin servers.the query mode server ,always  throw oops,"no cube 
detail info loaded"
here is the kylin.log.

2017-01-18 14:41:13,514 INFO  [http-bio-7070-exec-2324] 
project.ProjectL2Cache:174 : Loading L2 project cache for 4399_ADGAME
2017-01-18 14:41:13,514 WARN  [http-bio-7070-exec-2324] 
realization.RealizationRegistry:120 : No provider for realization type 
INVERTED_INDEX
2017-01-18 14:41:13,515 WARN  [http-bio-7070-exec-2324] 
realization.RealizationRegistry:120 : No provider for realization type 
INVERTED_INDEX
2017-01-18 14:41:13,515 WARN  [http-bio-7070-exec-2324] 
realization.RealizationRegistry:120 : No provider for realization type 
INVERTED_INDEX
2017-01-18 14:41:13,515 WARN  [http-bio-7070-exec-2324] 
realization.RealizationRegistry:120 : No provider for realization type 
INVERTED_INDEX
2017-01-18 14:41:13,516 WARN  [http-bio-7070-exec-2324] 
realization.RealizationRegistry:120 : No provider for realization type 
INVERTED_INDEX
2017-01-18 14:41:13,516 WARN  [http-bio-7070-exec-2324] 
realization.RealizationRegistry:120 : No provider for realization type 
INVERTED_INDEX
2017-01-18 14:41:13,516 ERROR [http-bio-7070-exec-2324] 
controller.TableController:100 : Failed to deal with the request.
java.lang.NullPointerException
at 
org.apache.kylin.cube.CubeInstance.getAllColumns(CubeInstance.java:393)
at 
org.apache.kylin.metadata.project.ProjectL2Cache.sanityCheck(ProjectL2Cache.java:236)
at 
org.apache.kylin.metadata.project.ProjectL2Cache.loadCache(ProjectL2Cache.java:220)
at 
org.apache.kylin.metadata.project.ProjectL2Cache.getCache(ProjectL2Cache.java:167)
at 
org.apache.kylin.metadata.project.ProjectL2Cache.listDefinedTables(ProjectL2Cache.java:75)
at 
org.apache.kylin.metadata.project.ProjectManager.listDefinedTables(ProjectManager.java:405)
at 
org.apache.kylin.rest.controller.TableController.getHiveTables(TableController.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:1
04)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:
743)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:672)
at 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:82)
at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:933)
at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:867)
at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:951)
at 
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:842)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:624)
at 
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:827)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at 
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
at 

[jira] [Updated] (KYLIN-2404) Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml

2017-01-17 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-2404:

Description: 
Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to 
proper size after creating the intermediate hive table. While in some users' 
environment, hive merge small files is enabled by default, that will cause 
additional CPU and will impact on the cube building performance (in extreme 
case the files will be merged to 256MB, then only very small number of mappers 
be started in building).

So Kylin should explicitly tell Hive to disable the merge small files feature 
when  creating and redistributing the intermediate flat table. Adding 
"hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml 
with value "false" will solve this. The meaning of these two parameters can be 
found in 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration

  was:
Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to 
proper size after creating the intermediate hive table. While in some users' 
environment, hive merge small files is enabled by default, that will cause 
additional CPU and will impact on the cube building performance (in extreme 
case the files will be merged to 256MB, then only very small number of mappers 
be started in building).

So Kylin should explicitly tell Hive to disable the merge small files feature 
when  creating and redistributing the intermediate flat table. Will add 
"hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml 
with value "false". The meaning of these two parameters can be found in 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration


> Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml
> -
>
> Key: KYLIN-2404
> URL: https://issues.apache.org/jira/browse/KYLIN-2404
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v2.0.0
>
>
> Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to 
> proper size after creating the intermediate hive table. While in some users' 
> environment, hive merge small files is enabled by default, that will cause 
> additional CPU and will impact on the cube building performance (in extreme 
> case the files will be merged to 256MB, then only very small number of 
> mappers be started in building).
> So Kylin should explicitly tell Hive to disable the merge small files feature 
> when  creating and redistributing the intermediate flat table. Adding 
> "hive.merge.mapfiles" and "hive.merge.mapredfiles" to 
> conf/kylin_hive_conf.xml with value "false" will solve this. The meaning of 
> these two parameters can be found in 
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2404) Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml

2017-01-17 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-2404:

Description: 
Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to 
proper size after creating the intermediate hive table. While in some users' 
environment, hive merge small files is enabled by default, that will cause 
additional CPU and will impact on the cube building performance (in extreme 
case the files will be merged to 256MB, then only very small number of mappers 
be started in building).

So Kylin should explicitly tell Hive to disable the merge small files feature 
when  creating and redistributing the intermediate flat table. Will add 
"hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml 
with value "false". The meaning of these two parameters can be found in 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration

  was:
Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to 
proper size after creating the intermediate hive table. While in some users' 
environment, hive merge small files is enabled by default, that will cause 
additional CPU and will impact on the cube building performance (in extreme 
case the files will be merged to 256MB, then only very small number of mappers 
be started in building).

So Kylin should explicitly tell Hive to disable the merge small files feature 
when  creating and redistributing the intermediate flat table. Will add 
"hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml 
with value "false".


> Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml
> -
>
> Key: KYLIN-2404
> URL: https://issues.apache.org/jira/browse/KYLIN-2404
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
>Priority: Minor
> Fix For: v2.0.0
>
>
> Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to 
> proper size after creating the intermediate hive table. While in some users' 
> environment, hive merge small files is enabled by default, that will cause 
> additional CPU and will impact on the cube building performance (in extreme 
> case the files will be merged to 256MB, then only very small number of 
> mappers be started in building).
> So Kylin should explicitly tell Hive to disable the merge small files feature 
> when  creating and redistributing the intermediate flat table. Will add 
> "hive.merge.mapfiles" and "hive.merge.mapredfiles" to 
> conf/kylin_hive_conf.xml with value "false". The meaning of these two 
> parameters can be found in 
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2404) Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to kylin_hive_conf.xml

2017-01-17 Thread Shaofeng SHI (JIRA)
Shaofeng SHI created KYLIN-2404:
---

 Summary: Add "hive.merge.mapfiles" and "hive.merge.mapredfiles" to 
kylin_hive_conf.xml
 Key: KYLIN-2404
 URL: https://issues.apache.org/jira/browse/KYLIN-2404
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Reporter: Shaofeng SHI
Assignee: Shaofeng SHI
Priority: Minor
 Fix For: v2.0.0


Since 1.5.3, Kylin uses a "redistribute" step to merge the small files to 
proper size after creating the intermediate hive table. While in some users' 
environment, hive merge small files is enabled by default, that will cause 
additional CPU and will impact on the cube building performance (in extreme 
case the files will be merged to 256MB, then only very small number of mappers 
be started in building).

So Kylin should explicitly tell Hive to disable the merge small files feature 
when  creating and redistributing the intermediate flat table. Will add 
"hive.merge.mapfiles" and "hive.merge.mapredfiles" to conf/kylin_hive_conf.xml 
with value "false".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2403) tableau extract month in where

2017-01-17 Thread Pavel Tarasov (JIRA)
Pavel Tarasov created KYLIN-2403:


 Summary: tableau extract month in where
 Key: KYLIN-2403
 URL: https://issues.apache.org/jira/browse/KYLIN-2403
 Project: Kylin
  Issue Type: Bug
  Components: 3rd Party
Affects Versions: v1.5.4.1
Reporter: Pavel Tarasov


I’ve problem with tableau & kylin connect. When creating filter on month in 
tableau it generate query with filter

WHERE (({fn EXTRACT(MONTH  FROM "TEST_ORDERFACT"."DADD")} - 1) / 3 + 1 = 2).


Detailed query example from tableau:

SELECT "AMOCRM_MANAGERS"."NAME" AS "NAME__AMOCRM_MANAGERS_",
 {fn EXTRACT(MONTH FROM "TEST_ORDERFACT"."DADD")} AS "mn_DADD_ok",
 ({fn EXTRACT(MONTH FROM "TEST_ORDERFACT"."DADD")} - 1) / 3 + 1 AS "qr_DADD_ok",
 SUM("TEST_ORDERFACT"."AMOUNT") AS "sum_AMOUNT_ok",
 {fn EXTRACT(YEAR FROM "TEST_ORDERFACT"."DADD")} AS "yr_DADD_ok"
FROM "PTARASOV"."TEST_ORDERFACT" "TEST_ORDERFACT"
 INNER JOIN "REALTYANALYTICS"."CLIENTCATEGORIES" "CLIENTCATEGORIES" ON 
("TEST_ORDERFACT"."CLIENTCATEGORY" = "CLIENTCATEGORIES"."ID")
 INNER JOIN "REALTYANALYTICS"."AMOCRM_MANAGERS" "AMOCRM_MANAGERS" ON 
("TEST_ORDERFACT"."MANAGER" = "AMOCRM_MANAGERS"."ID")
 LEFT JOIN "REALTYANALYTICS"."LOCATIONS" "LOCATIONS" ON 
("TEST_ORDERFACT"."REGION" = "LOCATIONS"."ID")
 LEFT JOIN "REALTYANALYTICS"."ORDERFACTSERVICEPACKAGESOURCETYPES" 
"ORDERFACTSERVICEPACKAGESOURCETYPES" ON 
("TEST_ORDERFACT"."ORDERFACTSERVICEPACKAGESOURCETYPEID" = 
"ORDERFACTSERVICEPACKAGESOURCETYPES"."ID")
 INNER JOIN "REALTYANALYTICS"."PRODUCTCATEGORIES" "PRODUCTCATEGORIES" ON 
("TEST_ORDERFACT"."TARIF" = "PRODUCTCATEGORIES"."ID")
 INNER JOIN "REALTYANALYTICS"."PRODUCTS" "PRODUCTS" ON 
("TEST_ORDERFACT"."PRODUCT" = "PRODUCTS"."ID")
WHERE (("AMOCRM_MANAGERS"."NAME" = 'Саркис Ирицян') AND 
("TEST_ORDERFACT"."USERSITEID" = 3032446) AND (({fn EXTRACT(MONTH FROM 
"TEST_ORDERFACT"."DADD")} - 1) / 3 + 1 = 3))
GROUP BY "AMOCRM_MANAGERS"."NAME",
 {fn EXTRACT(MONTH FROM "TEST_ORDERFACT"."DADD")},
 ({fn EXTRACT(MONTH FROM "TEST_ORDERFACT"."DADD")} - 1) / 3 + 1,
 {fn EXTRACT(YEAR FROM "TEST_ORDERFACT"."DADD")}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-17 Thread Dayue Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825814#comment-15825814
 ] 

Dayue Gao commented on KYLIN-2387:
--

Commit 
https://github.com/apache/kylin/commit/e894465007f422d619ddeab2acd87e38fa093fd9 
removes the usage of ImmutableRoaringBitmap.bitmapOf.

> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer 
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a 
> copied buffer. So we always deserialize to ImmutableBitmapCounter at first, 
> and convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using 
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only 
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is 
> allocated
> * The wire format is the same as before 
> ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). 
> Therefore no cube rebuild is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2370) Refine unload table

2017-01-17 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated KYLIN-2370:
-
Attachment: 0001-KYLIN-2370-Refine-unload-and-reload-table.patch

I have completed this issue.Please check it.Thanks!

> Refine unload table
> ---
>
> Key: KYLIN-2370
> URL: https://issues.apache.org/jira/browse/KYLIN-2370
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Reporter: Zhixiong Chen
>Assignee: Zhixiong Chen
> Attachments: 0001-KYLIN-2370-Refine-unload-and-reload-table.patch
>
>
> Move the Unload Button to the Table level .
> Add a reload button side by side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-17 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825776#comment-15825776
 ] 

Shaofeng SHI commented on KYLIN-2387:
-

If it is not easy to keep compitable with old version, we can shade the old 
version when building binary package.

> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer 
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a 
> copied buffer. So we always deserialize to ImmutableBitmapCounter at first, 
> and convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using 
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only 
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is 
> allocated
> * The wire format is the same as before 
> ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). 
> Therefore no cube rebuild is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-17 Thread Dayue Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825749#comment-15825749
 ] 

Dayue Gao commented on KYLIN-2387:
--

OK, I'll remove the usage of ImmutableRoaringBitmap.bitmapOf. But I'm not sure 
if there are any other incompatible methods.

> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer 
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a 
> copied buffer. So we always deserialize to ImmutableBitmapCounter at first, 
> and convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using 
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only 
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is 
> allocated
> * The wire format is the same as before 
> ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). 
> Therefore no cube rebuild is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2402) execute query hits No registered coprocessor service

2017-01-17 Thread answer (JIRA)
answer created KYLIN-2402:
-

 Summary: execute query hits No registered coprocessor service
 Key: KYLIN-2402
 URL: https://issues.apache.org/jira/browse/KYLIN-2402
 Project: Kylin
  Issue Type: Improvement
 Environment: hadoop2.6.4 /hbase1.1.5/hive1.2.1/zk3.4.6/kylin1.6.0
Reporter: answer


today I add kylin.coprocessor.local.jar in kylin.property and when build cube 
done . I execute query "select * from xxx" hits error below:
 org.apache.hadoop.hbase.exceptions.UnknownProtocolException: No registered 
coprocessor service found for name CubeVisitService in region 
KYLIN_GA464RXYQX,,1484635706236.9aac092cc3bd8855999c7a8734f09842.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-17 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825722#comment-15825722
 ] 

Shaofeng SHI commented on KYLIN-2387:
-

I'm working on spark cubing (KYLIN-2331) recently, these classes will be 
serailized by Kryo, then trigger the initialization... 

> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer 
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a 
> copied buffer. So we always deserialize to ImmutableBitmapCounter at first, 
> and convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using 
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only 
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is 
> allocated
> * The wire format is the same as before 
> ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). 
> Therefore no cube rebuild is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-17 Thread Dayue Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825703#comment-15825703
 ] 

Dayue Gao commented on KYLIN-2387:
--

ImmutableRoaringBitmap.bitmapOf is only used in test, so it's possible to 
remove the usage of it.

But my question is, why does kylin load RoaringBitmap class from spark? Is it a 
classpath issue?

> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer 
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a 
> copied buffer. So we always deserialize to ImmutableBitmapCounter at first, 
> and convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using 
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only 
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is 
> allocated
> * The wire format is the same as before 
> ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). 
> Therefore no cube rebuild is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2387) A new BitmapCounter with better performance

2017-01-17 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825684#comment-15825684
 ] 

Shaofeng SHI commented on KYLIN-2387:
-

With a build from latest master branch, I got the following error on CDH 5.8:
{code}
Exception in thread "main" java.lang.NoSuchMethodError: 
org.roaringbitmap.buffer.ImmutableRoaringBitmap.bitmapOf([I)Lorg/roaringbitmap/buffer/ImmutableRoaringBitmap;
at 
org.apache.kylin.measure.bitmap.ImmutableBitmapCounter.(ImmutableBitmapCounter.java:38)
at 
org.apache.kylin.measure.bitmap.BitmapSerializer.(BitmapSerializer.java:28)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
{code}

It works before, so I think the error is related with this change. I also 
searched on the machine, but found CDH 5.8 still use an old roaringbitmap:

find / -name RoaringBitmap*.jar
/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/jars/RoaringBitmap-0.5.11.jar
/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/oozie/oozie-sharelib-mr1/lib/spark/RoaringBitmap-0.5.11.jar
/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p0.2/lib/oozie/oozie-sharelib-yarn/lib/spark/RoaringBitmap-0.5.11.jar


[~dayue] can we use a method that also exists in old version? Thanks.

> A new BitmapCounter with better performance
> ---
>
> Key: KYLIN-2387
> URL: https://issues.apache.org/jira/browse/KYLIN-2387
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata, Query Engine, Storage - HBase
>Affects Versions: v2.0.0
>Reporter: Dayue Gao
>Assignee: Dayue Gao
>
> We found the old BitmapCounter does not perform very well on very large 
> bitmap. The inefficiency comes from
> * Poor serialize implementation: instead of serialize bitmap directly to 
> ByteBuffer, it uses ByteArrayOutputStream as a temporal storage, which causes 
> superfluous memory allocations
> * Poor peekLength implementation: the whole bitmap is deserialized in order 
> to retrieve its serialized size
> * Extra deserialize cost: even if only cardinality info is needed to answer 
> query, the whole bitmap is deserialize into MutableRoaringBitmap
> A new BitmapCounter is designed to solve these problems
> * It comes in tow flavors, mutable and immutable, which is based on 
> Mutable/Immutable RoaringBitmap correspondingly
> * ImmutableBitmapCounter has lower deserialize cost, as it just maps to a 
> copied buffer. So we always deserialize to ImmutableBitmapCounter at first, 
> and convert it to MutableBitmapCounter only when necessary
> * peekLength is implemented using 
> ImmutableRoaringBitmap.serializedSizeInBytes, which is very fast since only 
> the header of roaring format is examined
> * It can directly serializes to ByteBuffer, no intermediate buffer is 
> allocated
> * The wire format is the same as before 
> ([RoaringFormatSpec|https://github.com/RoaringBitmap/RoaringFormatSpec/]). 
> Therefore no cube rebuild is needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KYLIN-2313) Cannot find a cube in a subquery case with count distinct

2017-01-17 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang resolved KYLIN-2313.
---
Resolution: Cannot Reproduce

> Cannot find a cube in a subquery case with count distinct
> -
>
> Key: KYLIN-2313
> URL: https://issues.apache.org/jira/browse/KYLIN-2313
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v1.6.0
>Reporter: Dong Li
>Assignee: liyang
>Priority: Minor
> Fix For: v2.0.0
>
>
> With sample cube, 
> The first query can find a cube and give correct result:
> select p.part_dt, p.lstg_site_id, p.grp, count(distinct user_id), sum(price) 
> from (
> select t.part_dt, t.lstg_site_id, t.user_id, t.price,
> case t.lstg_format_name when 'ABIN' then 'AAA' when 'BBIN' then 'BBB' else 
> 'CCC' end as grp from kylin_sales t) p 
> group by p.part_dt, p.lstg_site_id, p.grp
> The second query will throw exception: cannot find any realization:
> select p.part_dt, p.lstg_site_id, p.grp, count(distinct user_id), sum(price) 
> from (
> select t.part_dt, t.lstg_site_id, t.user_id, t.price,
> case t.lstg_format_name when 'ABIN' then 'AAA'
> when 'BBIN' then 'BBB'
> else 'CCC' end as grp
> from kylin_sales t
> ) p 
> where p.part_dt=DATE'2013-01-01'
> group by p.part_dt, p.lstg_site_id, p.grp
> Error message:
> Error while executing SQL "select p.part_dt, p.lstg_site_id, p.grp, 
> count(distinct user_id), sum(price) from ( select t.part_dt, t.lstg_site_id, 
> t.user_id, t.price, case t.lstg_format_name when 'ABIN' then 'AAA' when 
> 'BBIN' then 'BBB' else 'CCC' end as grp from kylin_sales t ) p where 
> p.part_dt='2013-01-01' group by p.part_dt, p.lstg_site_id, p.grp LIMIT 
> 5": Can't find any realization. Please confirm with providers. SQL 
> digest: fact table DEFAULT.KYLIN_SALES,group by [DEFAULT.KYLIN_SALES.PART_DT, 
> DEFAULT.KYLIN_SALES.LSTG_SITE_ID, 
> UNKNOWN_MODEL:DEFAULT._KYLIN_TABLE.GRP],filter on 
> [DEFAULT.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc 
> [expression=COUNT_DISTINCT, parameter=ParameterDesc [type=column, 
> value=USER_ID, nextParam=null], returnType=null], FunctionDesc 
> [expression=SUM, parameter=ParameterDesc [type=column, value=PRICE, 
> nextParam=null], returnType=null]].
> The difference between these 2 queries is: there's one condition in the 
> second query: where p.part_dt=DATE'2013-01-01'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KYLIN-2313) Cannot find a cube in a subquery case with count distinct

2017-01-17 Thread liyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyang reopened KYLIN-2313:
---

> Cannot find a cube in a subquery case with count distinct
> -
>
> Key: KYLIN-2313
> URL: https://issues.apache.org/jira/browse/KYLIN-2313
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v1.6.0
>Reporter: Dong Li
>Assignee: liyang
>Priority: Minor
> Fix For: v2.0.0
>
>
> With sample cube, 
> The first query can find a cube and give correct result:
> select p.part_dt, p.lstg_site_id, p.grp, count(distinct user_id), sum(price) 
> from (
> select t.part_dt, t.lstg_site_id, t.user_id, t.price,
> case t.lstg_format_name when 'ABIN' then 'AAA' when 'BBIN' then 'BBB' else 
> 'CCC' end as grp from kylin_sales t) p 
> group by p.part_dt, p.lstg_site_id, p.grp
> The second query will throw exception: cannot find any realization:
> select p.part_dt, p.lstg_site_id, p.grp, count(distinct user_id), sum(price) 
> from (
> select t.part_dt, t.lstg_site_id, t.user_id, t.price,
> case t.lstg_format_name when 'ABIN' then 'AAA'
> when 'BBIN' then 'BBB'
> else 'CCC' end as grp
> from kylin_sales t
> ) p 
> where p.part_dt=DATE'2013-01-01'
> group by p.part_dt, p.lstg_site_id, p.grp
> Error message:
> Error while executing SQL "select p.part_dt, p.lstg_site_id, p.grp, 
> count(distinct user_id), sum(price) from ( select t.part_dt, t.lstg_site_id, 
> t.user_id, t.price, case t.lstg_format_name when 'ABIN' then 'AAA' when 
> 'BBIN' then 'BBB' else 'CCC' end as grp from kylin_sales t ) p where 
> p.part_dt='2013-01-01' group by p.part_dt, p.lstg_site_id, p.grp LIMIT 
> 5": Can't find any realization. Please confirm with providers. SQL 
> digest: fact table DEFAULT.KYLIN_SALES,group by [DEFAULT.KYLIN_SALES.PART_DT, 
> DEFAULT.KYLIN_SALES.LSTG_SITE_ID, 
> UNKNOWN_MODEL:DEFAULT._KYLIN_TABLE.GRP],filter on 
> [DEFAULT.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc 
> [expression=COUNT_DISTINCT, parameter=ParameterDesc [type=column, 
> value=USER_ID, nextParam=null], returnType=null], FunctionDesc 
> [expression=SUM, parameter=ParameterDesc [type=column, value=PRICE, 
> nextParam=null], returnType=null]].
> The difference between these 2 queries is: there's one condition in the 
> second query: where p.part_dt=DATE'2013-01-01'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)