[jira] [Updated] (KYLIN-3114) Enable kylin.web.query-timeout for web query request
[ https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billy Liu updated KYLIN-3114: - Summary: Enable kylin.web.query-timeout for web query request (was: Make timeout for the queries submitted through the Web UI configurable) > Enable kylin.web.query-timeout for web query request > > > Key: KYLIN-3114 > URL: https://issues.apache.org/jira/browse/KYLIN-3114 > Project: Kylin > Issue Type: Improvement > Components: Web >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: Vsevolod Ostapenko >Priority: Minor > Fix For: v2.3.0 > > Attachments: KYLIN-3114.master.002.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > Currently query.js hard codes timeout for the queries submitted via Web UI to > be 300_000 milliseconds. > Depending on the situation, the default value can be either too large, or too > small, especially when query does not hit any cube and is passed through to > Hive or Impala. > Query timeout should be made configurable via kylin.properties. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KYLIN-3124) Support horizontal scroll bar in 'Insight'
[ https://issues.apache.org/jira/browse/KYLIN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billy Liu resolved KYLIN-3124. -- Resolution: Fixed > Support horizontal scroll bar in 'Insight' > -- > > Key: KYLIN-3124 > URL: https://issues.apache.org/jira/browse/KYLIN-3124 > Project: Kylin > Issue Type: Improvement > Components: Web >Affects Versions: v2.2.0 > Environment: Windows , Chrome Version 63.0.3239.108,Firefox 57.0.2 >Reporter: atul >Assignee: Zhixiong Chen >Priority: Minor > Fix For: v2.3.0 > > Attachments: horizontal_scrollbar_missing.jpg > > > As attached in screenshot, ,due to missing scrollbar in Kylin Web UI > (http://:7070/kylin/query) , it's difficult to view full name of cube > after executing query from "Insight" view of UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3182) Update Kylin help menu links
[ https://issues.apache.org/jira/browse/KYLIN-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342996#comment-16342996 ] Billy Liu commented on KYLIN-3182: -- Thanks [~Zhixiong Chen], I will update the links to docs23 later. > Update Kylin help menu links > > > Key: KYLIN-3182 > URL: https://issues.apache.org/jira/browse/KYLIN-3182 > Project: Kylin > Issue Type: Improvement > Components: Web >Affects Versions: v2.3.0 >Reporter: peng.jianhua >Assignee: Zhixiong Chen >Priority: Minor > Labels: patch > Fix For: v2.3.0 > > Attachments: 0001-KYLIN-3182-Kylin-Help-is-not-available.patch, > 0001-KYLIN-3182-complete-web-help-configuration.patch, help_blank.png > > > Kylin Help is not available, look like > !help_blank.png! > Then I analyse the web code, and find the problem. > The default help informations have been configured in the > 'kylin-defaults.properties', as follow > {code:java} > kylin.web.help.length=4 > kylin.web.help.0=start|Getting Started| > kylin.web.help.1=odbc|ODBC Driver| > kylin.web.help.2=tableau|Tableau Guide| > kylin.web.help.3=onboard|Cube Design Tutorial| > {code} > The all links haven't been configured, but the web page will judge whether > the link is empty, if the link is empty, the item under the 'Help' will not > be showed. > So I modified the web code, if the link of help document is null, it will be > discarded, not transformed to header page. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3207) Add blog Superset integrate Kylin
[ https://issues.apache.org/jira/browse/KYLIN-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billy Liu updated KYLIN-3207: - Fix Version/s: v2.3.0 > Add blog Superset integrate Kylin > - > > Key: KYLIN-3207 > URL: https://issues.apache.org/jira/browse/KYLIN-3207 > Project: Kylin > Issue Type: Improvement >Reporter: yongjie zhao >Assignee: yongjie zhao >Priority: Minor > Fix For: v2.3.0 > > Attachments: 0001-Add-blog-superset-with-kylin.patch > > > Add blog Superset integrate Kylin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3208) How to enable custom rule
[ https://issues.apache.org/jira/browse/KYLIN-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billy Liu updated KYLIN-3208: - Issue Type: New Feature (was: Improvement) > How to enable custom rule > - > > Key: KYLIN-3208 > URL: https://issues.apache.org/jira/browse/KYLIN-3208 > Project: Kylin > Issue Type: New Feature > Components: Job Engine >Affects Versions: v2.1.0 >Reporter: Manoj kumar >Assignee: Shaofeng SHI >Priority: Major > > - How to enable to custom rule for Data transformation - while > creating the data model, There should be some way user Define the rules on > data being selected. Before building the cube, some rules needs to be applied > based on User. Most of OLAP solution like ESSBASE/T1M does provide this > features. > - Entitlement of Data - How to do that in Kylin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342967#comment-16342967 ] fengYu commented on KYLIN-2929: --- Upload new patch, please review it if you are free. > speed up Dump file performance > -- > > Key: KYLIN-2929 > URL: https://issues.apache.org/jira/browse/KYLIN-2929 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu >Priority: Major > Labels: Performance > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch > > > when I work on KYLIN-2926, I find coprocessor will dump to disk once > estimatedMemSize is bigger than spillThreshold, and found that spill data > size is extraordinary smaller that estimatedMemSize, in my case dump file > size is about 8MB and spillThreshold is setting to 3GB. > So, I try to keep the spill data in memory rather than write the file to disk > immediately, and when those in-memory spill data reach the threshold, write > all spill files together. > In my case, the coprocessor process cost time drop from 22s to 16s, it is > about 30% upgrade。 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2929: -- Attachment: 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch > speed up Dump file performance > -- > > Key: KYLIN-2929 > URL: https://issues.apache.org/jira/browse/KYLIN-2929 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu >Priority: Major > Labels: Performance > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch > > > when I work on KYLIN-2926, I find coprocessor will dump to disk once > estimatedMemSize is bigger than spillThreshold, and found that spill data > size is extraordinary smaller that estimatedMemSize, in my case dump file > size is about 8MB and spillThreshold is setting to 3GB. > So, I try to keep the spill data in memory rather than write the file to disk > immediately, and when those in-memory spill data reach the threshold, write > all spill files together. > In my case, the coprocessor process cost time drop from 22s to 16s, it is > about 30% upgrade。 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-2929) speed up Dump file performance
[ https://issues.apache.org/jira/browse/KYLIN-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fengYu updated KYLIN-2929: -- Attachment: (was: 0002-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch) > speed up Dump file performance > -- > > Key: KYLIN-2929 > URL: https://issues.apache.org/jira/browse/KYLIN-2929 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu >Priority: Major > Labels: Performance > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-2929-speed-up-dump-performance-write-dump-file.patch > > > when I work on KYLIN-2926, I find coprocessor will dump to disk once > estimatedMemSize is bigger than spillThreshold, and found that spill data > size is extraordinary smaller that estimatedMemSize, in my case dump file > size is about 8MB and spillThreshold is setting to 3GB. > So, I try to keep the spill data in memory rather than write the file to disk > immediately, and when those in-memory spill data reach the threshold, write > all spill files together. > In my case, the coprocessor process cost time drop from 22s to 16s, it is > about 30% upgrade。 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3208) How to enable custom rule
Manoj kumar created KYLIN-3208: -- Summary: How to enable custom rule Key: KYLIN-3208 URL: https://issues.apache.org/jira/browse/KYLIN-3208 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v2.1.0 Reporter: Manoj kumar Assignee: Shaofeng SHI - How to enable to custom rule for Data transformation - while creating the data model, There should be some way user Define the rules on data being selected. Before building the cube, some rules needs to be applied based on User. Most of OLAP solution like ESSBASE/T1M does provide this features. - Entitlement of Data - How to do that in Kylin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3201) java.lang.RuntimeException: native lz4 library not available
[ https://issues.apache.org/jira/browse/KYLIN-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342919#comment-16342919 ] Kaige Liu commented on KYLIN-3201: --- Try this: Add "*kylin.engine.spark-conf.executor.extraJavaOptions -Dhdp.version=2.6.2.3-1 Djava.library.path=/usr/hdp/current/hadoop-client/lib/native/*" in kylin.propertes Above example is for HDP 2.6. Modify it according to your Hadoop distribution. > java.lang.RuntimeException: native lz4 library not available > > > Key: KYLIN-3201 > URL: https://issues.apache.org/jira/browse/KYLIN-3201 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.1.0 >Reporter: Keith Chen >Priority: Critical > > When i build cube with spark , the job was failed. > It report some exceptions about lz4, but i do not set lz4 properties. > > 18/01/26 13:18:40 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, executor 1): java.lang.RuntimeException: native lz4 library not > available > at > org.apache.hadoop.io.compress.Lz4Codec.getDecompressorType(Lz4Codec.java:195) > at > org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) > at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1983) > at > org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1878) > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1827) > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1841) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:49) > at > org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) > at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:251) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > java.lang.RuntimeException: error execute > org.apache.kylin.engine.spark.SparkCubingByLayer > at > org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42) > at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637) > Caused by: org.apache.spark.SparkException: Job aborted due to stage > failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task > 2.3 in stage 0.0 (TID 9, executor 1): java.lang.RuntimeException: native lz4 > library not available > at >
[jira] [Assigned] (KYLIN-3206) Consider SQL expression aggregation when applying limit push down
[ https://issues.apache.org/jira/browse/KYLIN-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billy Liu reassigned KYLIN-3206: Assignee: Paul Lin (was: liyang) > Consider SQL expression aggregation when applying limit push down > - > > Key: KYLIN-3206 > URL: https://issues.apache.org/jira/browse/KYLIN-3206 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v1.6.0, v2.2.0 > Environment: Kylin 1.6.0 >Reporter: Paul Lin >Assignee: Paul Lin >Priority: Major > > SQLs like "select floor(user_level /10), sum(money) group by floor(user_level > /10) limit 10" should not trigger storage limit push down, because the > expressions like "floor(user_level /10)" are sort of post aggregations. > Otherwise Kylin would do aggregations based on partial records, producing > inaccurate results. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KYLIN-3202) Doc directory for 2.3
[ https://issues.apache.org/jira/browse/KYLIN-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billy Liu resolved KYLIN-3202. -- Resolution: Fixed Assignee: Billy Liu Fix Version/s: v2.3.0 Commit: http://git-wip-us.apache.org/repos/asf/kylin/commit/40a53fe3 > Doc directory for 2.3 > - > > Key: KYLIN-3202 > URL: https://issues.apache.org/jira/browse/KYLIN-3202 > Project: Kylin > Issue Type: Sub-task >Reporter: Billy Liu >Assignee: Billy Liu >Priority: Major > Fix For: v2.3.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3207) Add blog Superset integrate Kylin
yongjie zhao created KYLIN-3207: --- Summary: Add blog Superset integrate Kylin Key: KYLIN-3207 URL: https://issues.apache.org/jira/browse/KYLIN-3207 Project: Kylin Issue Type: Improvement Reporter: yongjie zhao Assignee: yongjie zhao Attachments: 0001-Add-blog-superset-with-kylin.patch Add blog Superset integrate Kylin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KYLIN-3198) More Chinese Howto Documents
[ https://issues.apache.org/jira/browse/KYLIN-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Billy Liu reassigned KYLIN-3198: Assignee: Paul Lin > More Chinese Howto Documents > > > Key: KYLIN-3198 > URL: https://issues.apache.org/jira/browse/KYLIN-3198 > Project: Kylin > Issue Type: Improvement > Components: Documentation >Reporter: Billy Liu >Assignee: Paul Lin >Priority: Minor > Fix For: v2.3.0 > > Attachments: 80.patch > > > From [https://github.com/apache/kylin/pull/80] > Five how to documents are added: > > {{[added Chinese version of > howto_optimize_build|https://github.com/apache/kylin/pull/80/commits/25206674f9eda19417a3c5d5f15dae8adcdb6952]}} > > {{[added Chinese version of > howto_backup_metadata|https://github.com/apache/kylin/pull/80/commits/e773c00b030c29a79419403a86430715f1be57c3]}} > > [added Chinese version of > howto_build_cube_with_restapi|https://github.com/apache/kylin/pull/80/commits/087fb355b74f34cd9cc345ad4c3d76df11fd3c97] > > {{[added Chinese version of > howto_cleanup_storage|https://github.com/apache/kylin/pull/80/commits/7f2c690267b5a2b8c459e3ad07691c280ffd4e8c]}} > > {{[added Chinese version of > howto_jdbc|https://github.com/apache/kylin/pull/80/commits/38477271eed350d9576f484b414ddf2576d54d06]}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3201) java.lang.RuntimeException: native lz4 library not available
[ https://issues.apache.org/jira/browse/KYLIN-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Chen updated KYLIN-3201: -- Description: When i build cube with spark , the job was failed. It report some exceptions about lz4, but i do not set lz4 properties. 18/01/26 13:18:40 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, executor 1): java.lang.RuntimeException: native lz4 library not available at org.apache.hadoop.io.compress.Lz4Codec.getDecompressorType(Lz4Codec.java:195) at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1983) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1878) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1827) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1841) at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:49) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:251) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkCubingByLayer at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42) at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 9, executor 1): java.lang.RuntimeException: native lz4 library not available at org.apache.hadoop.io.compress.Lz4Codec.getDecompressorType(Lz4Codec.java:195) at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1983) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1878) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1827) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1841) at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:49) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:251) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211) at
[jira] [Commented] (KYLIN-2893) Missing zero check for totalHitFrequency in CuboidStats ctor
[ https://issues.apache.org/jira/browse/KYLIN-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342823#comment-16342823 ] Zhong Yanghong commented on KYLIN-2893: --- Hi [~liukaige], based on the logic to calculate {{totalHitFrequency}}, it will not be zero. > Missing zero check for totalHitFrequency in CuboidStats ctor > > > Key: KYLIN-2893 > URL: https://issues.apache.org/jira/browse/KYLIN-2893 > Project: Kylin > Issue Type: Bug >Reporter: Ted Yu >Assignee: Kaige Liu >Priority: Minor > Attachments: KYLIN-2893.master.001.patch > > > {code} > if (hitFrequencyMap.get(cuboid) != null) { > tmpCuboidHitProbabilityMap.put(cuboid, unitUncertainProb > + (1 - WEIGHT_FOR_UN_QUERY) * > hitFrequencyMap.get(cuboid) / totalHitFrequency); > {code} > We should check that totalHitFrequency is not zero before performing division. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3201) java.lang.RuntimeException: native lz4 library not available
[ https://issues.apache.org/jira/browse/KYLIN-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342822#comment-16342822 ] keith chen commented on KYLIN-3201: --- Hadoop 2.7.5 > java.lang.RuntimeException: native lz4 library not available > > > Key: KYLIN-3201 > URL: https://issues.apache.org/jira/browse/KYLIN-3201 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.1.0 >Reporter: keith chen >Priority: Critical > > When i build cube with spark , the job was failed. > it report some exceptions about lz4, but i do not set lz4 properties. > > 18/01/26 13:18:40 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, executor 1): java.lang.RuntimeException: native lz4 library not > available > at > org.apache.hadoop.io.compress.Lz4Codec.getDecompressorType(Lz4Codec.java:195) > at > org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) > at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1983) > at > org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1878) > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1827) > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1841) > at > org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:49) > at > org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64) > at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:251) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > java.lang.RuntimeException: error execute > org.apache.kylin.engine.spark.SparkCubingByLayer > at > org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42) > at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637) > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in > stage 0.0 (TID 9, executor 1): java.lang.RuntimeException: native lz4 library > not available > at > org.apache.hadoop.io.compress.Lz4Codec.getDecompressorType(Lz4Codec.java:195) > at > org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) > at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1983) > at >
[jira] [Commented] (KYLIN-3206) Consider SQL expression aggregation when applying limit push down
[ https://issues.apache.org/jira/browse/KYLIN-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342568#comment-16342568 ] Shaofeng SHI commented on KYLIN-3206: - Hi Paul, would you like to contribute a patch to Kylin? Thanks! > Consider SQL expression aggregation when applying limit push down > - > > Key: KYLIN-3206 > URL: https://issues.apache.org/jira/browse/KYLIN-3206 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v1.6.0, v2.2.0 > Environment: Kylin 1.6.0 >Reporter: Paul Lin >Assignee: liyang >Priority: Major > > SQLs like "select floor(user_level /10), sum(money) group by floor(user_level > /10) limit 10" should not trigger storage limit push down, because the > expressions like "floor(user_level /10)" are sort of post aggregations. > Otherwise Kylin would do aggregations based on partial records, producing > inaccurate results. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3205) Allow one column is used for both dimension and precisely count distinct measure
[ https://issues.apache.org/jira/browse/KYLIN-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342560#comment-16342560 ] albertoramon commented on KYLIN-3205: - Could be the same issue: KYLIN-2679 ? > Allow one column is used for both dimension and precisely count distinct > measure > > > Key: KYLIN-3205 > URL: https://issues.apache.org/jira/browse/KYLIN-3205 > Project: Kylin > Issue Type: Bug > Components: Metadata >Affects Versions: v2.2.0 >Reporter: kangkaisen >Assignee: kangkaisen >Priority: Major > Attachments: KYLIN-3205.patch > > > I Introduced a bug in KYLIN-2316, we should allow one column is used for both > dimension and precisely count distinct measure, as long as the dimension > encoding is not dict. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KYLIN-2909) Refine Email Template for notification by freemarker
[ https://issues.apache.org/jira/browse/KYLIN-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Li resolved KYLIN-2909. Resolution: Fixed Thanks Yanghong! PR has been merged to master branch, with an minor refactor commit appended: [https://github.com/apache/kylin/commit/74e3f614d6a9cc2807865b8fb210836880b8da85] > Refine Email Template for notification by freemarker > > > Key: KYLIN-2909 > URL: https://issues.apache.org/jira/browse/KYLIN-2909 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong >Priority: Major > Fix For: v2.3.0 > > Attachments: JOB-DISCARDED.png, JOB-SUCCEED.png, JOB_ERROR.png > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump con
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342541#comment-16342541 ] Shaofeng SHI commented on KYLIN-3122: - Dual date/time partition columns was not perfectly implemented I think. It only considered the case when fetch data from source (Hive), but not considered the segments pruning at query time. [~mahongbin] [~liyang.g...@gmail.com] I know you're investigating multiple partition columns, can this issue be fixed? Vsevolod, as a temporary solution, I suggest to use one column as the partition column; You can define a new column with view, which compose the "THEDATE" and "THEHOUR" column as one column, and then build cube and query with it. If the problem couldn't be fixed, I would suggest to remove the dual date/time partition columns here. > Partition elimination algorithm seems to be inefficient and have serious > issues with handling date/time ranges, can lead to very slow queries and > OOM/Java heap dump conditions > --- > > Key: KYLIN-3122 > URL: https://issues.apache.org/jira/browse/KYLIN-3122 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: hongbin ma >Priority: Critical > > Current algorithm of cube segment elimination seems to be rather inefficient. > We are using a model where cubes are partitioned by date and time: > "partition_desc": > { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": > "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": > "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", > "partition_condition_builder": > "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" > } > , > Cubes contain partitions for multiple days and 24 hours for each day. Each > cube segment corresponds to just one hour. > When a query is issued where both date and hour are specified using equality > condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially > integrates over all the segment cubes (hundreds of them) only to skip all > except for the one that needs to be scanned (which can be observed by looking > in the logs). > The expectation is that Kylin would use existing info on the partitioning > columns (date and time) and known hierarchical relations between date and > time to locate required partition much more efficiently that linear scan > through all the cube partitions. > Now, if filtering condition is on the range of hours, behavior of the > partition pruning and scanning becomes not very logical, which suggests bugs > in the logic. > If filtering condition is on specific date and closed-open range of hours > (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in > addition to sequentially scanning all the cube partitions (as described > above), Kylin will scan HBase tables for all the hours from the specified > starting hour and till the last hour of the day (e.g. from hour 10 to 24, > instead of just hour 10). > As the result query will run much longer that necessary, and might run out > of memory, causing JVM heap dump and Kylin server crash. > If filtering condition is on specific date by hour interval is specified as > open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= > '10'), Kylin will scan all HBase tables for all the later dates and hours > (e.g. from hour 10 and till the most recent hour on the most recent day, > which can be hundreds of tables and thousands of regions). > As the result query execution will dramatically increase and in most cases > Kylin server will be terminated with OOM error and JVM heap dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi
[ https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shaofeng SHI updated KYLIN-3122: Component/s: (was: Storage - HBase) Query Engine > Partition elimination algorithm seems to be inefficient and have serious > issues with handling date/time ranges, can lead to very slow queries and > OOM/Java heap dump conditions > --- > > Key: KYLIN-3122 > URL: https://issues.apache.org/jira/browse/KYLIN-3122 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v2.2.0 > Environment: HDP 2.5.6, Kylin 2.2.0 >Reporter: Vsevolod Ostapenko >Assignee: hongbin ma >Priority: Critical > > Current algorithm of cube segment elimination seems to be rather inefficient. > We are using a model where cubes are partitioned by date and time: > "partition_desc": > { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": > "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": > "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", > "partition_condition_builder": > "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder" > } > , > Cubes contain partitions for multiple days and 24 hours for each day. Each > cube segment corresponds to just one hour. > When a query is issued where both date and hour are specified using equality > condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially > integrates over all the segment cubes (hundreds of them) only to skip all > except for the one that needs to be scanned (which can be observed by looking > in the logs). > The expectation is that Kylin would use existing info on the partitioning > columns (date and time) and known hierarchical relations between date and > time to locate required partition much more efficiently that linear scan > through all the cube partitions. > Now, if filtering condition is on the range of hours, behavior of the > partition pruning and scanning becomes not very logical, which suggests bugs > in the logic. > If filtering condition is on specific date and closed-open range of hours > (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in > addition to sequentially scanning all the cube partitions (as described > above), Kylin will scan HBase tables for all the hours from the specified > starting hour and till the last hour of the day (e.g. from hour 10 to 24, > instead of just hour 10). > As the result query will run much longer that necessary, and might run out > of memory, causing JVM heap dump and Kylin server crash. > If filtering condition is on specific date by hour interval is specified as > open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= > '10'), Kylin will scan all HBase tables for all the later dates and hours > (e.g. from hour 10 and till the most recent hour on the most recent day, > which can be hundreds of tables and thousands of regions). > As the result query execution will dramatically increase and in most cases > Kylin server will be terminated with OOM error and JVM heap dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3011) Tool StorageCleanupJob will cleanup other environment's intermediate hive tables which are using
[ https://issues.apache.org/jira/browse/KYLIN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342524#comment-16342524 ] Shaofeng SHI commented on KYLIN-3011: - Hi yanghong, In the latest StorageCleanupJob, it fetches the job uuid from hive intermediate table, and then check whether it was created by current deployment; do you see it still cause the problem? > Tool StorageCleanupJob will cleanup other environment's intermediate hive > tables which are using > > > Key: KYLIN-3011 > URL: https://issues.apache.org/jira/browse/KYLIN-3011 > Project: Kylin > Issue Type: Bug >Reporter: Zhong Yanghong >Assignee: Kaige Liu >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-2556) Switch Findbugs to Spotbugs
[ https://issues.apache.org/jira/browse/KYLIN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342521#comment-16342521 ] Shaofeng SHI commented on KYLIN-2556: - Ok, thanks! > Switch Findbugs to Spotbugs > --- > > Key: KYLIN-2556 > URL: https://issues.apache.org/jira/browse/KYLIN-2556 > Project: Kylin > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Shaofeng SHI >Priority: Major > Fix For: v2.3.0 > > > HADOOP-14316 added Spotbugs which is more powerful than findbugs. > This issue is to introduce Spotbugs to Kylin in a similar manner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KYLIN-3206) Consider SQL expression aggregation when applying limit push down
Paul Lin created KYLIN-3206: --- Summary: Consider SQL expression aggregation when applying limit push down Key: KYLIN-3206 URL: https://issues.apache.org/jira/browse/KYLIN-3206 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v2.2.0, v1.6.0 Environment: Kylin 1.6.0 Reporter: Paul Lin Assignee: liyang SQLs like "select floor(user_level /10), sum(money) group by floor(user_level /10) limit 10" should not trigger storage limit push down, because the expressions like "floor(user_level /10)" are sort of post aggregations. Otherwise Kylin would do aggregations based on partial records, producing inaccurate results. -- This message was sent by Atlassian JIRA (v7.6.3#76005)