[jira] [Updated] (KYLIN-3114) Enable kylin.web.query-timeout for web query request

2018-01-28 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu updated KYLIN-3114:
-
Summary: Enable kylin.web.query-timeout for web query request  (was: Make 
timeout for the queries submitted through the Web UI configurable)

> Enable kylin.web.query-timeout for web query request
> 
>
> Key: KYLIN-3114
> URL: https://issues.apache.org/jira/browse/KYLIN-3114
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: KYLIN-3114.master.002.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently query.js hard codes timeout for the queries submitted via Web UI to 
> be 300_000 milliseconds.
> Depending on the situation, the default value can be either too large, or too 
> small, especially when query does not hit any cube and is passed through to 
> Hive or Impala.
> Query timeout should be made configurable via kylin.properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KYLIN-3124) Support horizontal scroll bar in 'Insight'

2018-01-28 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu resolved KYLIN-3124.
--
Resolution: Fixed

> Support horizontal scroll bar in 'Insight'
> --
>
> Key: KYLIN-3124
> URL: https://issues.apache.org/jira/browse/KYLIN-3124
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: Windows , Chrome Version 63.0.3239.108,Firefox 57.0.2 
>Reporter: atul
>Assignee: Zhixiong Chen
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: horizontal_scrollbar_missing.jpg
>
>
> As attached in screenshot, ,due to missing scrollbar in Kylin Web UI  
> (http://:7070/kylin/query) , it's difficult to view full name of cube 
> after executing query from "Insight" view of UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3182) Update Kylin help menu links

2018-01-28 Thread Billy Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342996#comment-16342996
 ] 

Billy Liu commented on KYLIN-3182:
--

Thanks [~Zhixiong Chen], I will update the links to docs23 later. 

> Update Kylin help menu links
> 
>
> Key: KYLIN-3182
> URL: https://issues.apache.org/jira/browse/KYLIN-3182
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.3.0
>Reporter: peng.jianhua
>Assignee: Zhixiong Chen
>Priority: Minor
>  Labels: patch
> Fix For: v2.3.0
>
> Attachments: 0001-KYLIN-3182-Kylin-Help-is-not-available.patch, 
> 0001-KYLIN-3182-complete-web-help-configuration.patch, help_blank.png
>
>
> Kylin Help is not available, look like
>   !help_blank.png! 
> Then I analyse the web code, and find the problem.
> The default help informations have been configured in the 
> 'kylin-defaults.properties', as follow
> {code:java}
> kylin.web.help.length=4
> kylin.web.help.0=start|Getting Started|
> kylin.web.help.1=odbc|ODBC Driver|
> kylin.web.help.2=tableau|Tableau Guide|
> kylin.web.help.3=onboard|Cube Design Tutorial|
> {code}
> The all links haven't been configured, but the web page will judge whether 
> the link is empty, if the link is empty, the item under the 'Help' will not 
> be showed.
> So I modified the web code, if the link of help document is null, it will be 
> discarded, not transformed to header page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3207) Add blog Superset integrate Kylin

2018-01-28 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu updated KYLIN-3207:
-
Fix Version/s: v2.3.0

> Add blog Superset integrate Kylin
> -
>
> Key: KYLIN-3207
> URL: https://issues.apache.org/jira/browse/KYLIN-3207
> Project: Kylin
>  Issue Type: Improvement
>Reporter: yongjie zhao
>Assignee: yongjie zhao
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: 0001-Add-blog-superset-with-kylin.patch
>
>
> Add blog Superset integrate Kylin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3208) How to enable custom rule

2018-01-28 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu updated KYLIN-3208:
-
Issue Type: New Feature  (was: Improvement)

> How to enable custom rule
> -
>
> Key: KYLIN-3208
> URL: https://issues.apache.org/jira/browse/KYLIN-3208
> Project: Kylin
>  Issue Type: New Feature
>  Components: Job Engine
>Affects Versions: v2.1.0
>Reporter: Manoj kumar
>Assignee: Shaofeng SHI
>Priority: Major
>
> -    How to enable to custom rule for Data transformation - while 
> creating the data model, There should be some way user Define the rules on 
> data being selected. Before building the cube, some rules needs to be applied 
> based on User. Most of OLAP solution like ESSBASE/T1M does provide this 
> features.
> -    Entitlement of Data - How to do that in Kylin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2929) speed up Dump file performance

2018-01-28 Thread fengYu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342967#comment-16342967
 ] 

fengYu commented on KYLIN-2929:
---

Upload new patch, please review it if you are free.

> speed up Dump file performance
> --
>
> Key: KYLIN-2929
> URL: https://issues.apache.org/jira/browse/KYLIN-2929
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>Priority: Major
>  Labels: Performance
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch
>
>
> when I work on KYLIN-2926, I find coprocessor will dump to disk once 
> estimatedMemSize is bigger than spillThreshold, and found that spill data 
> size is extraordinary smaller that estimatedMemSize, in my case dump file 
> size is about 8MB and spillThreshold is setting to 3GB.   
> So, I try to keep the spill data in memory rather than write the file to disk 
> immediately, and when those in-memory spill data reach the threshold, write 
> all spill files together.
> In my case, the coprocessor process cost time drop from 22s to 16s, it is 
> about 30% upgrade。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-2929) speed up Dump file performance

2018-01-28 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2929:
--
Attachment: 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch

> speed up Dump file performance
> --
>
> Key: KYLIN-2929
> URL: https://issues.apache.org/jira/browse/KYLIN-2929
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>Priority: Major
>  Labels: Performance
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch
>
>
> when I work on KYLIN-2926, I find coprocessor will dump to disk once 
> estimatedMemSize is bigger than spillThreshold, and found that spill data 
> size is extraordinary smaller that estimatedMemSize, in my case dump file 
> size is about 8MB and spillThreshold is setting to 3GB.   
> So, I try to keep the spill data in memory rather than write the file to disk 
> immediately, and when those in-memory spill data reach the threshold, write 
> all spill files together.
> In my case, the coprocessor process cost time drop from 22s to 16s, it is 
> about 30% upgrade。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-2929) speed up Dump file performance

2018-01-28 Thread fengYu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fengYu updated KYLIN-2929:
--
Attachment: (was: 
0002-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch)

> speed up Dump file performance
> --
>
> Key: KYLIN-2929
> URL: https://issues.apache.org/jira/browse/KYLIN-2929
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.0.0
>Reporter: fengYu
>Assignee: fengYu
>Priority: Major
>  Labels: Performance
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch
>
>
> when I work on KYLIN-2926, I find coprocessor will dump to disk once 
> estimatedMemSize is bigger than spillThreshold, and found that spill data 
> size is extraordinary smaller that estimatedMemSize, in my case dump file 
> size is about 8MB and spillThreshold is setting to 3GB.   
> So, I try to keep the spill data in memory rather than write the file to disk 
> immediately, and when those in-memory spill data reach the threshold, write 
> all spill files together.
> In my case, the coprocessor process cost time drop from 22s to 16s, it is 
> about 30% upgrade。



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3208) How to enable custom rule

2018-01-28 Thread Manoj kumar (JIRA)
Manoj kumar created KYLIN-3208:
--

 Summary: How to enable custom rule
 Key: KYLIN-3208
 URL: https://issues.apache.org/jira/browse/KYLIN-3208
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v2.1.0
Reporter: Manoj kumar
Assignee: Shaofeng SHI


-    How to enable to custom rule for Data transformation - while creating 
the data model, There should be some way user Define the rules on data being 
selected. Before building the cube, some rules needs to be applied based on 
User. Most of OLAP solution like ESSBASE/T1M does provide this features.

-    Entitlement of Data - How to do that in Kylin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3201) java.lang.RuntimeException: native lz4 library not available

2018-01-28 Thread Kaige Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342919#comment-16342919
 ] 

 Kaige Liu commented on KYLIN-3201:
---

Try this:

Add "*kylin.engine.spark-conf.executor.extraJavaOptions -Dhdp.version=2.6.2.3-1 
Djava.library.path=/usr/hdp/current/hadoop-client/lib/native/*" in 
kylin.propertes

Above example is for HDP 2.6. Modify it according to your Hadoop distribution.

 

> java.lang.RuntimeException: native lz4 library not available
> 
>
> Key: KYLIN-3201
> URL: https://issues.apache.org/jira/browse/KYLIN-3201
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.1.0
>Reporter: Keith Chen
>Priority: Critical
>
> When i build cube with spark , the job was failed.
> It report some exceptions about lz4, but i do not set lz4 properties.
>  
> 18/01/26 13:18:40 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, executor 1): java.lang.RuntimeException: native lz4 library not 
> available
>  at 
> org.apache.hadoop.io.compress.Lz4Codec.getDecompressorType(Lz4Codec.java:195)
>  at 
> org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
>  at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1983)
>  at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1878)
>  at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1827)
>  at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1841)
>  at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:49)
>  at 
> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
>  at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252)
>  at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:251)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>  at org.apache.spark.scheduler.Task.run(Task.scala:99)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> java.lang.RuntimeException: error execute 
> org.apache.kylin.engine.spark.SparkCubingByLayer
>  at 
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
>  at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
>  Caused by: org.apache.spark.SparkException: Job aborted due to stage 
> failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 
> 2.3 in stage 0.0 (TID 9, executor 1): java.lang.RuntimeException: native lz4 
> library not available
>  at 
> 

[jira] [Assigned] (KYLIN-3206) Consider SQL expression aggregation when applying limit push down

2018-01-28 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu reassigned KYLIN-3206:


Assignee: Paul Lin  (was: liyang)

> Consider SQL expression aggregation when applying limit push down
> -
>
> Key: KYLIN-3206
> URL: https://issues.apache.org/jira/browse/KYLIN-3206
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v1.6.0, v2.2.0
> Environment: Kylin 1.6.0
>Reporter: Paul Lin
>Assignee: Paul Lin
>Priority: Major
>
> SQLs like "select floor(user_level /10), sum(money) group by floor(user_level 
> /10) limit 10"  should not trigger storage limit push down, because the 
> expressions like "floor(user_level /10)" are sort of post aggregations. 
> Otherwise Kylin would do aggregations based on partial records, producing 
> inaccurate results. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KYLIN-3202) Doc directory for 2.3

2018-01-28 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu resolved KYLIN-3202.
--
   Resolution: Fixed
 Assignee: Billy Liu
Fix Version/s: v2.3.0

Commit: http://git-wip-us.apache.org/repos/asf/kylin/commit/40a53fe3

> Doc directory for 2.3
> -
>
> Key: KYLIN-3202
> URL: https://issues.apache.org/jira/browse/KYLIN-3202
> Project: Kylin
>  Issue Type: Sub-task
>Reporter: Billy Liu
>Assignee: Billy Liu
>Priority: Major
> Fix For: v2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3207) Add blog Superset integrate Kylin

2018-01-28 Thread yongjie zhao (JIRA)
yongjie zhao created KYLIN-3207:
---

 Summary: Add blog Superset integrate Kylin
 Key: KYLIN-3207
 URL: https://issues.apache.org/jira/browse/KYLIN-3207
 Project: Kylin
  Issue Type: Improvement
Reporter: yongjie zhao
Assignee: yongjie zhao
 Attachments: 0001-Add-blog-superset-with-kylin.patch

Add blog Superset integrate Kylin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KYLIN-3198) More Chinese Howto Documents

2018-01-28 Thread Billy Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy Liu reassigned KYLIN-3198:


Assignee: Paul Lin

> More Chinese Howto Documents
> 
>
> Key: KYLIN-3198
> URL: https://issues.apache.org/jira/browse/KYLIN-3198
> Project: Kylin
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Billy Liu
>Assignee: Paul Lin
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: 80.patch
>
>
> From [https://github.com/apache/kylin/pull/80]
> Five how to documents are added:
>  
> {{[added Chinese version of 
> howto_optimize_build|https://github.com/apache/kylin/pull/80/commits/25206674f9eda19417a3c5d5f15dae8adcdb6952]}}
>  
> {{[added Chinese version of 
> howto_backup_metadata|https://github.com/apache/kylin/pull/80/commits/e773c00b030c29a79419403a86430715f1be57c3]}}
>  
> [added Chinese version of 
> howto_build_cube_with_restapi|https://github.com/apache/kylin/pull/80/commits/087fb355b74f34cd9cc345ad4c3d76df11fd3c97]
>  
> {{[added Chinese version of 
> howto_cleanup_storage|https://github.com/apache/kylin/pull/80/commits/7f2c690267b5a2b8c459e3ad07691c280ffd4e8c]}}
>  
> {{[added Chinese version of 
> howto_jdbc|https://github.com/apache/kylin/pull/80/commits/38477271eed350d9576f484b414ddf2576d54d06]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3201) java.lang.RuntimeException: native lz4 library not available

2018-01-28 Thread Keith Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith Chen updated KYLIN-3201:
--
Description: 
When i build cube with spark , the job was failed.

It report some exceptions about lz4, but i do not set lz4 properties.

 

18/01/26 13:18:40 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
(TID 0, executor 1): java.lang.RuntimeException: native lz4 library not 
available
 at 
org.apache.hadoop.io.compress.Lz4Codec.getDecompressorType(Lz4Codec.java:195)
 at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
 at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1983)
 at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1878)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1827)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1841)
 at 
org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:49)
 at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:251)
 at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
 at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
 at org.apache.spark.scheduler.Task.run(Task.scala:99)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

 

java.lang.RuntimeException: error execute 
org.apache.kylin.engine.spark.SparkCubingByLayer
 at 
org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
 at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 
0.0 (TID 9, executor 1): java.lang.RuntimeException: native lz4 library not 
available
 at 
org.apache.hadoop.io.compress.Lz4Codec.getDecompressorType(Lz4Codec.java:195)
 at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
 at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1983)
 at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1878)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1827)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1841)
 at 
org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:49)
 at 
org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:251)
 at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
 at 

[jira] [Commented] (KYLIN-2893) Missing zero check for totalHitFrequency in CuboidStats ctor

2018-01-28 Thread Zhong Yanghong (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342823#comment-16342823
 ] 

Zhong Yanghong commented on KYLIN-2893:
---

Hi [~liukaige], based on the logic to calculate {{totalHitFrequency}}, it will 
not be zero.

> Missing zero check for totalHitFrequency in CuboidStats ctor
> 
>
> Key: KYLIN-2893
> URL: https://issues.apache.org/jira/browse/KYLIN-2893
> Project: Kylin
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee:  Kaige Liu
>Priority: Minor
> Attachments: KYLIN-2893.master.001.patch
>
>
> {code}
> if (hitFrequencyMap.get(cuboid) != null) {
> tmpCuboidHitProbabilityMap.put(cuboid, unitUncertainProb
> + (1 - WEIGHT_FOR_UN_QUERY) * 
> hitFrequencyMap.get(cuboid) / totalHitFrequency);
> {code}
> We should check that totalHitFrequency is not zero before performing division.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3201) java.lang.RuntimeException: native lz4 library not available

2018-01-28 Thread keith chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342822#comment-16342822
 ] 

keith chen commented on KYLIN-3201:
---

Hadoop 2.7.5

> java.lang.RuntimeException: native lz4 library not available
> 
>
> Key: KYLIN-3201
> URL: https://issues.apache.org/jira/browse/KYLIN-3201
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.1.0
>Reporter: keith chen
>Priority: Critical
>
> When i build cube with spark , the job was failed.
> it report some exceptions about lz4, but i do not set lz4 properties.
>  
> 18/01/26 13:18:40 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 
> (TID 0, executor 1): java.lang.RuntimeException: native lz4 library not 
> available
>  at 
> org.apache.hadoop.io.compress.Lz4Codec.getDecompressorType(Lz4Codec.java:195)
>  at 
> org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
>  at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1983)
>  at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1878)
>  at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1827)
>  at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1841)
>  at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:49)
>  at 
> org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
>  at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252)
>  at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:251)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>  at org.apache.spark.scheduler.Task.run(Task.scala:99)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> java.lang.RuntimeException: error execute 
> org.apache.kylin.engine.spark.SparkCubingByLayer
>  at 
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
>  at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in 
> stage 0.0 (TID 9, executor 1): java.lang.RuntimeException: native lz4 library 
> not available
>  at 
> org.apache.hadoop.io.compress.Lz4Codec.getDecompressorType(Lz4Codec.java:195)
>  at 
> org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
>  at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1983)
>  at 
> 

[jira] [Commented] (KYLIN-3206) Consider SQL expression aggregation when applying limit push down

2018-01-28 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342568#comment-16342568
 ] 

Shaofeng SHI commented on KYLIN-3206:
-

Hi Paul, would you like to contribute a patch to Kylin? Thanks!

> Consider SQL expression aggregation when applying limit push down
> -
>
> Key: KYLIN-3206
> URL: https://issues.apache.org/jira/browse/KYLIN-3206
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v1.6.0, v2.2.0
> Environment: Kylin 1.6.0
>Reporter: Paul Lin
>Assignee: liyang
>Priority: Major
>
> SQLs like "select floor(user_level /10), sum(money) group by floor(user_level 
> /10) limit 10"  should not trigger storage limit push down, because the 
> expressions like "floor(user_level /10)" are sort of post aggregations. 
> Otherwise Kylin would do aggregations based on partial records, producing 
> inaccurate results. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3205) Allow one column is used for both dimension and precisely count distinct measure

2018-01-28 Thread albertoramon (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342560#comment-16342560
 ] 

albertoramon commented on KYLIN-3205:
-

Could be the same issue: KYLIN-2679 ?

> Allow one column is used for both dimension and precisely count distinct 
> measure
> 
>
> Key: KYLIN-3205
> URL: https://issues.apache.org/jira/browse/KYLIN-3205
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: v2.2.0
>Reporter: kangkaisen
>Assignee: kangkaisen
>Priority: Major
> Attachments: KYLIN-3205.patch
>
>
> I Introduced a bug in KYLIN-2316, we should allow one column is used for both 
> dimension and precisely count distinct measure, as long as the  dimension 
> encoding is not dict.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KYLIN-2909) Refine Email Template for notification by freemarker

2018-01-28 Thread Dong Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Li resolved KYLIN-2909.

Resolution: Fixed

Thanks Yanghong! PR has been merged to master branch, with an minor refactor 
commit appended:

[https://github.com/apache/kylin/commit/74e3f614d6a9cc2807865b8fb210836880b8da85]

> Refine Email Template for notification by freemarker
> 
>
> Key: KYLIN-2909
> URL: https://issues.apache.org/jira/browse/KYLIN-2909
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>Priority: Major
> Fix For: v2.3.0
>
> Attachments: JOB-DISCARDED.png, JOB-SUCCEED.png, JOB_ERROR.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump con

2018-01-28 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342541#comment-16342541
 ] 

Shaofeng SHI commented on KYLIN-3122:
-

Dual date/time partition columns was not perfectly implemented I think. It only 
considered the case when fetch data from source (Hive), but not considered the 
segments pruning at query time.

[~mahongbin] [~liyang.g...@gmail.com] I know you're investigating multiple 
partition columns, can this issue be fixed?

Vsevolod, as a temporary solution, I suggest to use one column as the partition 
column; You can define a new column with view, which compose the "THEDATE" and 
"THEHOUR" column as one column, and then build cube and query with it.

If the problem couldn't be fixed, I would suggest to remove the dual date/time 
partition columns here.

 

 

> Partition elimination algorithm seems to be inefficient and have serious 
> issues with handling date/time ranges, can lead to very slow queries and 
> OOM/Java heap dump conditions
> ---
>
> Key: KYLIN-3122
> URL: https://issues.apache.org/jira/browse/KYLIN-3122
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: hongbin ma
>Priority: Critical
>
> Current algorithm of cube segment elimination seems to be rather inefficient.
>  We are using a model where cubes are partitioned by date and time:
>  "partition_desc":
> { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": 
> "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": 
> "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", 
> "partition_condition_builder": 
> "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
>  }
> ,
> Cubes contain partitions for multiple days and 24 hours for each day. Each 
> cube segment corresponds to just one hour.
> When a query is issued where both date and hour are specified using equality 
> condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
> integrates over all the segment cubes (hundreds of them) only to skip all 
> except for the one that needs to be scanned (which can be observed by looking 
> in the logs).
>  The expectation is that Kylin would use existing info on the partitioning 
> columns (date and time) and known hierarchical relations between date and 
> time to locate required partition much more efficiently that linear scan 
> through all the cube partitions.
> Now, if filtering condition is on the range of hours, behavior of the 
> partition pruning and scanning becomes not very logical, which suggests bugs 
> in the logic.
> If filtering condition is on specific date and closed-open range of hours 
> (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in 
> addition to sequentially scanning all the cube partitions (as described 
> above), Kylin will scan HBase tables for all the hours from the specified 
> starting hour and till the last hour of the day (e.g. from hour 10 to 24, 
> instead of just hour 10).
>  As the result query will run much longer that necessary, and might run out 
> of memory, causing JVM heap dump and Kylin server crash.
> If filtering condition is on specific date by hour interval is specified as 
> open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= 
> '10'), Kylin will scan all HBase tables for all the later dates and hours 
> (e.g. from hour 10 and till the most recent hour on the most recent day, 
> which can be hundreds of tables and thousands of regions).
>  As the result query execution will dramatically increase and in most cases 
> Kylin server will be terminated with OOM error and JVM heap dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi

2018-01-28 Thread Shaofeng SHI (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaofeng SHI updated KYLIN-3122:

Component/s: (was: Storage - HBase)
 Query Engine

> Partition elimination algorithm seems to be inefficient and have serious 
> issues with handling date/time ranges, can lead to very slow queries and 
> OOM/Java heap dump conditions
> ---
>
> Key: KYLIN-3122
> URL: https://issues.apache.org/jira/browse/KYLIN-3122
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: hongbin ma
>Priority: Critical
>
> Current algorithm of cube segment elimination seems to be rather inefficient.
>  We are using a model where cubes are partitioned by date and time:
>  "partition_desc":
> { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": 
> "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": 
> "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", 
> "partition_condition_builder": 
> "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
>  }
> ,
> Cubes contain partitions for multiple days and 24 hours for each day. Each 
> cube segment corresponds to just one hour.
> When a query is issued where both date and hour are specified using equality 
> condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
> integrates over all the segment cubes (hundreds of them) only to skip all 
> except for the one that needs to be scanned (which can be observed by looking 
> in the logs).
>  The expectation is that Kylin would use existing info on the partitioning 
> columns (date and time) and known hierarchical relations between date and 
> time to locate required partition much more efficiently that linear scan 
> through all the cube partitions.
> Now, if filtering condition is on the range of hours, behavior of the 
> partition pruning and scanning becomes not very logical, which suggests bugs 
> in the logic.
> If filtering condition is on specific date and closed-open range of hours 
> (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in 
> addition to sequentially scanning all the cube partitions (as described 
> above), Kylin will scan HBase tables for all the hours from the specified 
> starting hour and till the last hour of the day (e.g. from hour 10 to 24, 
> instead of just hour 10).
>  As the result query will run much longer that necessary, and might run out 
> of memory, causing JVM heap dump and Kylin server crash.
> If filtering condition is on specific date by hour interval is specified as 
> open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= 
> '10'), Kylin will scan all HBase tables for all the later dates and hours 
> (e.g. from hour 10 and till the most recent hour on the most recent day, 
> which can be hundreds of tables and thousands of regions).
>  As the result query execution will dramatically increase and in most cases 
> Kylin server will be terminated with OOM error and JVM heap dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3011) Tool StorageCleanupJob will cleanup other environment's intermediate hive tables which are using

2018-01-28 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342524#comment-16342524
 ] 

Shaofeng SHI commented on KYLIN-3011:
-

Hi yanghong,

 

In the latest StorageCleanupJob, it fetches the job uuid from hive intermediate 
table, and then check whether it was created by current deployment; do you see 
it still cause the problem?

> Tool StorageCleanupJob will cleanup other environment's intermediate hive 
> tables which are using
> 
>
> Key: KYLIN-3011
> URL: https://issues.apache.org/jira/browse/KYLIN-3011
> Project: Kylin
>  Issue Type: Bug
>Reporter: Zhong Yanghong
>Assignee:  Kaige Liu
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-2556) Switch Findbugs to Spotbugs

2018-01-28 Thread Shaofeng SHI (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342521#comment-16342521
 ] 

Shaofeng SHI commented on KYLIN-2556:
-

Ok, thanks!

> Switch Findbugs to Spotbugs
> ---
>
> Key: KYLIN-2556
> URL: https://issues.apache.org/jira/browse/KYLIN-2556
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Shaofeng SHI
>Priority: Major
> Fix For: v2.3.0
>
>
> HADOOP-14316 added Spotbugs which is more powerful than findbugs.
> This issue is to introduce Spotbugs to Kylin in a similar manner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3206) Consider SQL expression aggregation when applying limit push down

2018-01-28 Thread Paul Lin (JIRA)
Paul Lin created KYLIN-3206:
---

 Summary: Consider SQL expression aggregation when applying limit 
push down
 Key: KYLIN-3206
 URL: https://issues.apache.org/jira/browse/KYLIN-3206
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v2.2.0, v1.6.0
 Environment: Kylin 1.6.0
Reporter: Paul Lin
Assignee: liyang


SQLs like "select floor(user_level /10), sum(money) group by floor(user_level 
/10) limit 10"  should not trigger storage limit push down, because the 
expressions like "floor(user_level /10)" are sort of post aggregations. 
Otherwise Kylin would do aggregations based on partial records, producing 
inaccurate results. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)