[jira] [Commented] (KYLIN-2106) UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - could possibly impact also cube metadata
[ https://issues.apache.org/jira/browse/KYLIN-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592904#comment-15592904 ] Richard Calaba commented on KYLIN-2106: --- Found myself - as nicely described here :) - https://stackoverflow.com/questions/21903805/how-to-download-a-single-commit-diff-from-github To get this patch: https://github.com/apache/kylin/commit/1e049817856ede06c7c8736ad1d608765f301a21.patch > UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - > could possibly impact also cube metadata > - > > Key: KYLIN-2106 > URL: https://issues.apache.org/jira/browse/KYLIN-2106 > Project: Kylin > Issue Type: Bug > Components: Job Engine, Web >Affects Versions: v1.5.4.1 >Reporter: Richard Calaba >Assignee: Zhong,Jason > > I have realized possib,e bug in UI, to reproduce: > 1) Create Cube which has in Advance Settings - Rowkeys section a encoding of > one of the dimension (in my case it was 1st dimension - customer_id) set > encoding to 'Integer' - I used length 22. (Do not use the 'Int' Deprecated) > 2) Save this cube. > 3) Clone this cube > 4) The cloned cube had the same dimension in the Rowkeys section in Edit mode > marked as 'Int(deprecated)' and the length of the Int dictionary encoding was > set to 'ger' - obviously some issue while parsing the new Encoding type - > hopefully only on UI ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2106) UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - could possibly impact also cube metadata
[ https://issues.apache.org/jira/browse/KYLIN-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592660#comment-15592660 ] Richard Calaba commented on KYLIN-2106: --- Hi, I see - this is in master branch right ?? Can we generate patch for 1.5.4.1 ??? As it is latest released ??? Is there an easy way to generate that patch from Git ?? > UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - > could possibly impact also cube metadata > - > > Key: KYLIN-2106 > URL: https://issues.apache.org/jira/browse/KYLIN-2106 > Project: Kylin > Issue Type: Bug > Components: Job Engine, Web >Affects Versions: v1.5.4.1 >Reporter: Richard Calaba >Assignee: Zhong,Jason > > I have realized possib,e bug in UI, to reproduce: > 1) Create Cube which has in Advance Settings - Rowkeys section a encoding of > one of the dimension (in my case it was 1st dimension - customer_id) set > encoding to 'Integer' - I used length 22. (Do not use the 'Int' Deprecated) > 2) Save this cube. > 3) Clone this cube > 4) The cloned cube had the same dimension in the Rowkeys section in Edit mode > marked as 'Int(deprecated)' and the length of the Int dictionary encoding was > set to 'ger' - obviously some issue while parsing the new Encoding type - > hopefully only on UI ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2106) UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - could possibly impact also cube metadata
[ https://issues.apache.org/jira/browse/KYLIN-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589628#comment-15589628 ] Richard Calaba commented on KYLIN-2106: --- In addition - it seems that this bug is not only UI - I am not able to Build a cube using 'Integer' dictionary encoding -> it always fails in the Build Step - Create HTable - in the log of the Step, there is only: #result code:2 Cannot provide Diagnostics output - there is another issue which is causing DiagnosticsCLI to go to endless loop and flooding kylin.log with errors related to some network timeout. BUT using the same cube and changing the encoding to 'Int(deprecated)' works fine -> so definitely some problem with this new ditcionary encoding in BOTH UI and Job Engine. > UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - > could possibly impact also cube metadata > - > > Key: KYLIN-2106 > URL: https://issues.apache.org/jira/browse/KYLIN-2106 > Project: Kylin > Issue Type: Bug > Components: Job Engine, Web >Affects Versions: v1.5.4.1 >Reporter: Richard Calaba >Assignee: Zhong,Jason > > I have realized possib,e bug in UI, to reproduce: > 1) Create Cube which has in Advance Settings - Rowkeys section a encoding of > one of the dimension (in my case it was 1st dimension - customer_id) set > encoding to 'Integer' - I used length 22. (Do not use the 'Int' Deprecated) > 2) Save this cube. > 3) Clone this cube > 4) The cloned cube had the same dimension in the Rowkeys section in Edit mode > marked as 'Int(deprecated)' and the length of the Int dictionary encoding was > set to 'ger' - obviously some issue while parsing the new Encoding type - > hopefully only on UI ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-2106) UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - could possibly impact also cube metadata
[ https://issues.apache.org/jira/browse/KYLIN-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-2106: -- Summary: UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - could possibly impact also cube metadata (was: UI bug - Advanced Settings - Rowkeys - could possibly impact also cube metadata) > UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - > could possibly impact also cube metadata > - > > Key: KYLIN-2106 > URL: https://issues.apache.org/jira/browse/KYLIN-2106 > Project: Kylin > Issue Type: Bug > Components: Job Engine, Web >Affects Versions: v1.5.4.1 >Reporter: Richard Calaba >Assignee: Zhong,Jason > > I have realized possib,e bug in UI, to reproduce: > 1) Create Cube which has in Advance Settings - Rowkeys section a encoding of > one of the dimension (in my case it was 1st dimension - customer_id) set > encoding to 'Integer' - I used length 22. (Do not use the 'Int' Deprecated) > 2) Save this cube. > 3) Clone this cube > 4) The cloned cube had the same dimension in the Rowkeys section in Edit mode > marked as 'Int(deprecated)' and the length of the Int dictionary encoding was > set to 'ger' - obviously some issue while parsing the new Encoding type - > hopefully only on UI ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2106) UI bug - Advanced Settings - Rowkeys - could possibly impact also cube metadata
Richard Calaba created KYLIN-2106: - Summary: UI bug - Advanced Settings - Rowkeys - could possibly impact also cube metadata Key: KYLIN-2106 URL: https://issues.apache.org/jira/browse/KYLIN-2106 Project: Kylin Issue Type: Bug Components: Web Affects Versions: v1.5.4.1 Reporter: Richard Calaba Assignee: Zhong,Jason I have realized possib,e bug in UI, to reproduce: 1) Create Cube which has in Advance Settings - Rowkeys section a encoding of one of the dimension (in my case it was 1st dimension - customer_id) set encoding to 'Integer' - I used length 22. (Do not use the 'Int' Deprecated) 2) Save this cube. 3) Clone this cube 4) The cloned cube had the same dimension in the Rowkeys section in Edit mode marked as 'Int(deprecated)' and the length of the Int dictionary encoding was set to 'ger' - obviously some issue while parsing the new Encoding type - hopefully only on UI ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;
[ https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584879#comment-15584879 ] Richard Calaba commented on KYLIN-2094: --- Ok - confirmed - the removal of the kylin-jdbc-*.jar is workaround for this bug. Thank you ! > Build Step #3 - java.lang.VerifyError: class > com.mapr.fs.proto.Common$ServiceData overrides final method > getParserForType.()Lcom/google/protobuf/Parser; > > > Key: KYLIN-2094 > URL: https://issues.apache.org/jira/browse/KYLIN-2094 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v1.5.4.1 > Environment: MapR 4.1 >Reporter: Richard Calaba >Assignee: Dong Li > Attachments: job_2016_10_13_20_00_25.zip > > > When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct > Columns I am getting an error: > java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides > final method getParserForType.()Lcom/google/protobuf/Parser; > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at > com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773) > at > com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64) > at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) > at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89) > at > org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90) > at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114) > at > org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34) > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76) > at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255) > at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279) > at > org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:150) > at > org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:108) > at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:88) > at > org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113) > at >
[jira] [Comment Edited] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;
[ https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584831#comment-15584831 ] Richard Calaba edited comment on KYLIN-2094 at 10/18/16 8:21 AM: - Ok I have removed the kylin-jdbc-*.jar file from $KYLIN_HOME/lib directory and restartedthe server - now waiting for the Cube Build to pass through step #3 ... In regards to the 2nd issue - To be able to query Cubes Build in previous version of Kylin 1.5.3 - I had to do the below Coprocessor migration (as advised at https://kylin.apache.org/docs15/howto/howto_update_coprocessor.html) $KYLIN_HOME/bin/kylin.sh org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI $KYLIN_HOME/lib/kylin-coprocessor-*.jar all was (Author: cal...@gmail.com): Ok I have removed the kylin-jdbc-*.jar file from $KYLIN_HOME/lib directory and restartedthe server - now waiting for the Cube Build to pass through step #3 ... In regards to the 2nd issue - To be able to query Cubes Build in previous version of Kylin 1.5.3 - I had to do in advance this As advised here https://kylin.apache.org/docs15/howto/howto_update_coprocessor.html) $KYLIN_HOME/bin/kylin.sh org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI $KYLIN_HOME/lib/kylin-coprocessor-*.jar all > Build Step #3 - java.lang.VerifyError: class > com.mapr.fs.proto.Common$ServiceData overrides final method > getParserForType.()Lcom/google/protobuf/Parser; > > > Key: KYLIN-2094 > URL: https://issues.apache.org/jira/browse/KYLIN-2094 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v1.5.4.1 > Environment: MapR 4.1 >Reporter: Richard Calaba >Assignee: Dong Li > Attachments: job_2016_10_13_20_00_25.zip > > > When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct > Columns I am getting an error: > java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides > final method getParserForType.()Lcom/google/protobuf/Parser; > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at > com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773) > at > com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64) > at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) > at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89) > at > org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90) > at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114) > at > org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34) > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76) > at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255) >
[jira] [Comment Edited] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;
[ https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584831#comment-15584831 ] Richard Calaba edited comment on KYLIN-2094 at 10/18/16 8:20 AM: - Ok I have removed the kylin-jdbc-*.jar file from $KYLIN_HOME/lib directory and restartedthe server - now waiting for the Cube Build to pass through step #3 ... In regards to the 2nd issue - To be able to query Cubes Build in previous version of Kylin 1.5.3 - I had to do in advance this As advised here https://kylin.apache.org/docs15/howto/howto_update_coprocessor.html) $KYLIN_HOME/bin/kylin.sh org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI $KYLIN_HOME/lib/kylin-coprocessor-*.jar all was (Author: cal...@gmail.com): Ok I have removed the kylin-jdbc-*.jar file from $KYLIN_HOME/lib directory and restartedthe server - now waiting for the Cube Build to pass through step #3 ... To be abl eto query Cubes Build in previous version of Kylin 1.5.3 - I had to do in advance this As advised here https://kylin.apache.org/docs15/howto/howto_update_coprocessor.html) $KYLIN_HOME/bin/kylin.sh org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI $KYLIN_HOME/lib/kylin-coprocessor-*.jar all > Build Step #3 - java.lang.VerifyError: class > com.mapr.fs.proto.Common$ServiceData overrides final method > getParserForType.()Lcom/google/protobuf/Parser; > > > Key: KYLIN-2094 > URL: https://issues.apache.org/jira/browse/KYLIN-2094 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v1.5.4.1 > Environment: MapR 4.1 >Reporter: Richard Calaba >Assignee: Dong Li > Attachments: job_2016_10_13_20_00_25.zip > > > When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct > Columns I am getting an error: > java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides > final method getParserForType.()Lcom/google/protobuf/Parser; > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at > com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773) > at > com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64) > at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) > at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89) > at > org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90) > at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114) > at > org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34) > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76) > at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255) > at
[jira] [Commented] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;
[ https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584831#comment-15584831 ] Richard Calaba commented on KYLIN-2094: --- Ok I have removed the kylin-jdbc-*.jar file from $KYLIN_HOME/lib directory and restartedthe server - now waiting for the Cube Build to pass through step #3 ... To be abl eto query Cubes Build in previous version of Kylin 1.5.3 - I had to do in advance this As advised here https://kylin.apache.org/docs15/howto/howto_update_coprocessor.html) $KYLIN_HOME/bin/kylin.sh org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI $KYLIN_HOME/lib/kylin-coprocessor-*.jar all > Build Step #3 - java.lang.VerifyError: class > com.mapr.fs.proto.Common$ServiceData overrides final method > getParserForType.()Lcom/google/protobuf/Parser; > > > Key: KYLIN-2094 > URL: https://issues.apache.org/jira/browse/KYLIN-2094 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v1.5.4.1 > Environment: MapR 4.1 >Reporter: Richard Calaba >Assignee: Dong Li > Attachments: job_2016_10_13_20_00_25.zip > > > When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct > Columns I am getting an error: > java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides > final method getParserForType.()Lcom/google/protobuf/Parser; > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at > com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773) > at > com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64) > at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) > at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89) > at > org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90) > at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114) > at > org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34) > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76) > at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255) > at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279) > at > org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:150) > at >
[jira] [Commented] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;
[ https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584803#comment-15584803 ] Richard Calaba commented on KYLIN-2094: --- Ok - thanx - will try and reconfirm. Another issue is that Cube successfully build in Kylin 1.5.3 cannot be queried in 1.5.4.1 (query doens't return result) - I think I saw also some protobuf exceptions in the logs hopefully it is the same issue ... going to check now ... > Build Step #3 - java.lang.VerifyError: class > com.mapr.fs.proto.Common$ServiceData overrides final method > getParserForType.()Lcom/google/protobuf/Parser; > > > Key: KYLIN-2094 > URL: https://issues.apache.org/jira/browse/KYLIN-2094 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v1.5.4.1 > Environment: MapR 4.1 >Reporter: Richard Calaba >Assignee: Dong Li > Attachments: job_2016_10_13_20_00_25.zip > > > When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct > Columns I am getting an error: > java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides > final method getParserForType.()Lcom/google/protobuf/Parser; > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at > com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773) > at > com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64) > at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) > at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89) > at > org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90) > at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114) > at > org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34) > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76) > at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255) > at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279) > at > org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:150) > at > org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:108) > at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:88) > at > org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120) > at >
[jira] [Commented] (KYLIN-2104) loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/p
[ https://issues.apache.org/jira/browse/KYLIN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584777#comment-15584777 ] Richard Calaba commented on KYLIN-2104: --- FINALLY SEEMS I HAVE A SOLUTION: Did update apache-maven (previous version Apache Maven 3.0.5 (Red Hat 3.0.5-16)) -> updated to Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T12:41:47-04:00) The other working server Maven version was 3.3.3 Have also deleted all old repository cache in home directory for npm (~/.npm) and for maven (~/.m2) Either the repository cache cleanup or the version update resolved my problem . After that I am able to build BIN version from sources (for any kylin git tag - 1.5.3 / 1.5.4.1 / master) and the BIN compiled version works correctly. Also the BIN packge size difference (~ 20-30 MBs) is gone. In addition I was able to return back the NodeJS version 6.7.0 back - and it is still working. Closing ticket as resolved -> most probably maven version or maven repository cache issue. > loader constraint violation: loader (instance of > org/apache/catalina/loader/WebappClassLoader) previously initiated loading > for a different type with name "com/google/protobuf/ByteString" > --- > > Key: KYLIN-2104 > URL: https://issues.apache.org/jira/browse/KYLIN-2104 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.3 > Environment: MapR 4.1 - Edge node >Reporter: Richard Calaba >Priority: Critical > > Something very odd is with v.1.5.3 compilation & packaging scripts - it seems > that during compilation some req. library is missing or another version is > being used and this is not reported as a compilation error which is causing > issues later in runtime. > On my MapR 4.1 system - EDGE node which has all necessary access rights for > hbase/hive + other packaging tools installed I did this: > 1) Followed the https://kylin.apache.org/development/howto_package.html - > with one exception - from git I am not clonning latest master branch but > specific released Kylin version using tag kylin-1.5.3 > 2) The bin package is compiled successfully without any errors being reported > (I believe test cases are skipped this way - so cannot say test cases run ok) > 3) I then installed the successfully compiled Kylin 1.5.3 from sources and > run Kylin - all seems OK. > 4) I defined and successfully build 2 cubes - no issues during the build > process. (Maybe except the fact that Cube size is reported to be 0 Kb on UI > having approx. 350 million rows processed during Build -> that looks more > like some other bug). > 5) If I go to Insights tab in Kylin UI and run any query which should return > some data (350 mil. rows processed during build) I am getting an error: > a) 1st time I run any query - ERROR: loader constraint violation: loader > (instance of org/apache/catalina/loader/WebappClassLoader) previously > initiated loading for a different type with name > "com/google/protobuf/ByteString" > b) 2nd and later times - ERROR says only "com/google/protobuf/ByteString" > 6) If I STOP the Kylin -> replace the whole binary installation with the > officialy released binary package of Kylin 1.5.3 (for HBase 0.98/0.99) - I > can run my queries without any issue > The reason why I am reporting this bug on v 1.5.3 and not on latest released > sources 1.5.4.1 is that I have issues to have 1.5.4.1 working - see > https://issues.apache.org/jira/browse/KYLIN-2094 - Bin release fails in step > #3 of the Build process and 1.5.4.1 compiled from sources doesn't work for > me. > All points to some issues with incorrect dependencies being detected during > compilation and/or runtime ... maybe related to Google's Protobuffers ...??? > Anyone has any idea how to debug this problem ?? Basically it makes both > 1.5.3 and 1.5.4.1 not working on my system. > On different system (also MapR 4.1) few months back -> I didn't have those > issues -I was able successfully re-compile sources of 1.5.x versions > including some additional patches relased for them. > Beacuse no errror is reported during the Kylin compilation & packaging > process -> all indicates that there is some strange non-resolved dependency > which was OK on my previous MapR system but is different on my current MapR > system. Could be anything ... > I will try to attach the compiled binary package here so some guru can have a > look and let me know why "successfully" compiled Kylin from sources doesn't > run same as the original BIN release. (BTW my compiled archive is several 30 > MBs larger than the released binary package ...) -- This message was sent by Atlassian JIRA
[jira] [Comment Edited] (KYLIN-2104) loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/goo
[ https://issues.apache.org/jira/browse/KYLIN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584626#comment-15584626 ] Richard Calaba edited comment on KYLIN-2104 at 10/18/16 6:35 AM: - Anyway - this doesn't resolve anything - If I use my own compiled kylin from source-code, after trying to run a Query - I am getting in kylin.log: 1) ===> FIRST ATTEMPT TO RUN QUERY ==[QUERY]=== SQL: select ti.business_date_year, ti.business_date_month, sum(pos_netsales) from mdl_pos_item as fact left outer join mdl_timeinfo as ti on ti.business_date = fact.business_date where fact.business_date between '2016-01-01' and '2016-12-31' group by ti.business_date_year, ti.business_date_month order by ti.business_date_year ASC, ti.business_date_month ASC User: ADMIN Success: false Duration: 0.0 Project: jambajuice_3_0 Realization Names: [jambajuice_3_0_MDL_POS_ITEM] Cuboid Ids: [34] Total scan count: 0 Result row count: 0 Accept Partial: true Is Partial Result: false Hit Exception Cache: false Storage cache used: false Message: loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/protobuf/ByteString" ==[QUERY]=== 2016-10-18 02:34:25,077 ERROR [http-bio-7070-exec-6] controller.BasicController:44 : org.apache.kylin.rest.exception.InternalErrorException: loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/protobuf/ByteString" at org.apache.kylin.rest.controller.QueryController.doQueryWithCache(QueryController.java:224) at org.apache.kylin.rest.controller.QueryController.query(QueryController.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882) at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789) at javax.servlet.http.HttpServlet.service(HttpServlet.java:650) at javax.servlet.http.HttpServlet.service(HttpServlet.java:731) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at
[jira] [Commented] (KYLIN-2104) loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/p
[ https://issues.apache.org/jira/browse/KYLIN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584626#comment-15584626 ] Richard Calaba commented on KYLIN-2104: --- Anyway - this doesn't resolve anything - If I use my own compiled kylin from source-code, after trying to run a Query - I am getting in kylin.log: ==[QUERY]=== SQL: select ti.business_date_year, ti.business_date_month, sum(pos_netsales) from mdl_pos_item as fact left outer join mdl_timeinfo as ti on ti.business_date = fact.business_date where fact.business_date between '2016-01-01' and '2016-12-31' group by ti.business_date_year, ti.business_date_month order by ti.business_date_year ASC, ti.business_date_month ASC User: ADMIN Success: false Duration: 0.0 Project: jambajuice_3_0 Realization Names: [jambajuice_3_0_MDL_POS_ITEM] Cuboid Ids: [34] Total scan count: 0 Result row count: 0 Accept Partial: true Is Partial Result: false Hit Exception Cache: false Storage cache used: false Message: com/google/protobuf/ByteString ==[QUERY]=== 2016-10-18 02:23:49,232 ERROR [http-bio-7070-exec-3] controller.BasicController:44 : org.apache.kylin.rest.exception.InternalErrorException: com/google/protobuf/ByteString at org.apache.kylin.rest.controller.QueryController.doQueryWithCache(QueryController.java:224) at org.apache.kylin.rest.controller.QueryController.query(QueryController.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882) at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789) at javax.servlet.http.HttpServlet.service(HttpServlet.java:650) at javax.servlet.http.HttpServlet.service(HttpServlet.java:731) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:103) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at
[jira] [Commented] (KYLIN-2104) loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/p
[ https://issues.apache.org/jira/browse/KYLIN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584255#comment-15584255 ] Richard Calaba commented on KYLIN-2104: --- As I cannot attach too large files here, then you can download the BIN relases from here: 1) ORIGINAL Kylin 1.5.3 - https://archive.apache.org/dist/kylin/apache-kylin-1.5.3/apache-kylin-1.5.3-bin.tar.gz 2) My version of Kylin 1.5.3 - gitc lone with tag kylin-1.5.3 and compilation & packaging from sources - no errors reported - https://drive.google.com/file/d/0Bz5GkHbD3o7KcFhIUVh2LWhCeDg/view?usp=sharing > loader constraint violation: loader (instance of > org/apache/catalina/loader/WebappClassLoader) previously initiated loading > for a different type with name "com/google/protobuf/ByteString" > --- > > Key: KYLIN-2104 > URL: https://issues.apache.org/jira/browse/KYLIN-2104 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.3 > Environment: MapR 4.1 - Edge node >Reporter: Richard Calaba >Priority: Critical > > Something very odd is with v.1.5.3 compilation & packaging scripts - it seems > that during compilation some req. library is missing or another version is > being used and this is not reported as a compilation error which is causing > issues later in runtime. > On my MapR 4.1 system - EDGE node which has all necessary access rights for > hbase/hive + other packaging tools installed I did this: > 1) Followed the https://kylin.apache.org/development/howto_package.html - > with one exception - from git I am not clonning latest master branch but > specific released Kylin version using tag kylin-1.5.3 > 2) The bin package is compiled successfully without any errors being reported > (I believe test cases are skipped this way - so cannot say test cases run ok) > 3) I then installed the successfully compiled Kylin 1.5.3 from sources and > run Kylin - all seems OK. > 4) I defined and successfully build 2 cubes - no issues during the build > process. (Maybe except the fact that Cube size is reported to be 0 Kb on UI > having approx. 350 million rows processed during Build -> that looks more > like some other bug). > 5) If I go to Insights tab in Kylin UI and run any query which should return > some data (350 mil. rows processed during build) I am getting an error: > a) 1st time I run any query - ERROR: loader constraint violation: loader > (instance of org/apache/catalina/loader/WebappClassLoader) previously > initiated loading for a different type with name > "com/google/protobuf/ByteString" > b) 2nd and later times - ERROR says only "com/google/protobuf/ByteString" > 6) If I STOP the Kylin -> replace the whole binary installation with the > officialy released binary package of Kylin 1.5.3 (for HBase 0.98/0.99) - I > can run my queries without any issue > The reason why I am reporting this bug on v 1.5.3 and not on latest released > sources 1.5.4.1 is that I have issues to have 1.5.4.1 working - see > https://issues.apache.org/jira/browse/KYLIN-2094 - Bin release fails in step > #3 of the Build process and 1.5.4.1 compiled from sources doesn't work for > me. > All points to some issues with incorrect dependencies being detected during > compilation and/or runtime ... maybe related to Google's Protobuffers ...??? > Anyone has any idea how to debug this problem ?? Basically it makes both > 1.5.3 and 1.5.4.1 not working on my system. > On different system (also MapR 4.1) few months back -> I didn't have those > issues -I was able successfully re-compile sources of 1.5.x versions > including some additional patches relased for them. > Beacuse no errror is reported during the Kylin compilation & packaging > process -> all indicates that there is some strange non-resolved dependency > which was OK on my previous MapR system but is different on my current MapR > system. Could be anything ... > I will try to attach the compiled binary package here so some guru can have a > look and let me know why "successfully" compiled Kylin from sources doesn't > run same as the original BIN release. (BTW my compiled archive is several 30 > MBs larger than the released binary package ...) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-2104) loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/pro
[ https://issues.apache.org/jira/browse/KYLIN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-2104: -- Description: Something very odd is with v.1.5.3 compilation & packaging scripts - it seems that during compilation some req. library is missing or another version is being used and this is not reported as a compilation error which is causing issues later in runtime. On my MapR 4.1 system - EDGE node which has all necessary access rights for hbase/hive + other packaging tools installed I did this: 1) Followed the https://kylin.apache.org/development/howto_package.html - with one exception - from git I am not clonning latest master branch but specific released Kylin version using tag kylin-1.5.3 2) The bin package is compiled successfully without any errors being reported (I believe test cases are skipped this way - so cannot say test cases run ok) 3) I then installed the successfully compiled Kylin 1.5.3 from sources and run Kylin - all seems OK. 4) I defined and successfully build 2 cubes - no issues during the build process. (Maybe except the fact that Cube size is reported to be 0 Kb on UI having approx. 350 million rows processed during Build -> that looks more like some other bug). 5) If I go to Insights tab in Kylin UI and run any query which should return some data (350 mil. rows processed during build) I am getting an error: a) 1st time I run any query - ERROR: loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/protobuf/ByteString" b) 2nd and later times - ERROR says only "com/google/protobuf/ByteString" 6) If I STOP the Kylin -> replace the whole binary installation with the officialy released binary package of Kylin 1.5.3 (for HBase 0.98/0.99) - I can run my queries without any issue The reason why I am reporting this bug on v 1.5.3 and not on latest released sources 1.5.4.1 is that I have issues to have 1.5.4.1 working - see https://issues.apache.org/jira/browse/KYLIN-2094 - Bin release fails in step #3 of the Build process and 1.5.4.1 compiled from sources doesn't work for me. All points to some issues with incorrect dependencies being detected during compilation and/or runtime ... maybe related to Google's Protobuffers ...??? Anyone has any idea how to debug this problem ?? Basically it makes both 1.5.3 and 1.5.4.1 not working on my system. On different system (also MapR 4.1) few months back -> I didn't have those issues -I was able successfully re-compile sources of 1.5.x versions including some additional patches relased for them. Beacuse no errror is reported during the Kylin compilation & packaging process -> all indicates that there is some strange non-resolved dependency which was OK on my previous MapR system but is different on my current MapR system. Could be anything ... I will try to attach the compiled binary package here so some guru can have a look and let me know why "successfully" compiled Kylin from sources doesn't run same as the original BIN release. (BTW my compiled archive is several 30 MBs larger than the released binary package ...) was: Something very odd is with v.1.5.3 compilation & packaging scripts - it seems that during compilation some req. library is missing or another version is being used and this is not reported as a compilation error which is causing issues later in runtime. On my MapR 4.1 system - EDGE node which has all necessary access rights for hbase/hive + other packaging tools installed I did this: 1) Followed the https://kylin.apache.org/development/howto_package.html - with one exception - from git I am not clonning latest master branch but specific released Kylin version using tag kylin-1.5.3 2) The bin package is compiled successfully without any errors being reported (I believe test cases are skipped this way - so cannot say test cases run ok) 3) I then installed the successfully compiled Kylin 1.5.3 from sources and run Kylin - all seems OK. 4) I defined and successfully build 2 cubes - no issues during the build process. (Maybe except the fact that Cube size is reported to be 0 Kb on UI having approx. 350 million rows processed during Build -> that looks more like some other bug). 5) If I go to Insights tab in Kylin UI and run any query which should return some data (350 mil. rows processed during build) I am getting an error: a) 1st time I run any query - ERROR: loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/protobuf/ByteString" b) 2nd and later times - ERROR says only "com/google/protobuf/ByteString" 6) If I STOP the Kylin -> replace the whole binary installation with the officialy released binary package of Kylin 1.5.3 (for HBase
[jira] [Updated] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;
[ https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-2094: -- Attachment: job_2016_10_13_20_00_25.zip > Build Step #3 - java.lang.VerifyError: class > com.mapr.fs.proto.Common$ServiceData overrides final method > getParserForType.()Lcom/google/protobuf/Parser; > > > Key: KYLIN-2094 > URL: https://issues.apache.org/jira/browse/KYLIN-2094 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v1.5.4.1 > Environment: MapR 4.1 >Reporter: Richard Calaba >Assignee: Dong Li > Attachments: job_2016_10_13_20_00_25.zip > > > When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct > Columns I am getting an error: > java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides > final method getParserForType.()Lcom/google/protobuf/Parser; > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at > com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773) > at > com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100) > at > org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64) > at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) > at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89) > at > org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90) > at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114) > at > org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34) > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76) > at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255) > at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279) > at > org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:150) > at > org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:108) > at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:88) > at > org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57) > at >
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396108#comment-15396108 ] Richard Calaba commented on KYLIN-1834: --- Hmmm, strange - let me try to reproduce and provide the exact cube metadata so you can import it and look at it. Did you try with Kylin 1.5.2.1 or used latest Kylin sources from git ? > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > == > The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 > mio rows. I have increased the JVM -Xmx to 16gb and set the > kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube > build doesn't fail (previously we were getting exception complaining about > the 300MB limit for Dimension dictionary size (req. approx 700MB)). > == > Before that we were getting exception complaining about the Dictionary > encoding problem - "Too high cardinality is not suitable for dictionary -- > cardinality: 10873977" - this we resolved by changing the affected > dimension/row key Encoding from "dict" to "int; length=8" on the Advanced > Settings of the Cube. >
[jira] [Commented] (KYLIN-1886) When the calculation on measures is supported on Kylin?
[ https://issues.apache.org/jira/browse/KYLIN-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375763#comment-15375763 ] Richard Calaba commented on KYLIN-1886: --- I would like to weight in as well, some extensions which would be helpful: - Here - Custome Aggregation Types - https://issues.apache.org/jira/browse/KYLIN-976 - if we can have generic basic expression aggregation as part of Kylin standard (and supported on Kylin UI) we might not need to do workaround to use the calculated KPIs in the views on top of our fact/lookup tables - This should allow to specify basic arithmetic expressions (+,-,*,/, mod, div) - or even more advanced - and should be able to support also conditional CASTing ... as requested here https://issues.apache.org/jira/browse/KYLIN-976 - i.e. COUNT(CASE WHEN so.ft = 'fv' THEN soi.sc ELSE NULL END) or Sum(if...)) - to give an example -> I can do the SUM(a+b) as SUM(a) + SUM(b) now in Kylin and SQL - but I cannot do SUM(a/b) this way and I have to define it on the view level so it is applied to every row before it gets aggregated into Kylin KPI (Measure) >From https://issues.apache.org/jira/browse/KYLIN-976 - I read there is a way >coders can provide custom aggregation types, didn't have time to check and >test this approach - but maybe this is the way how we can achieve it >generic extension for Kylin can be then: - having a option on UI to define calculated measure (inputs are: expression string ; set of required measures/domains (used in the expression) ; and jar file with customer aggregation implementation) - The implementation can then read values from other measures/domains on the row (passed by Kylin Cube Build engine) - yes, we might need to resolve problem with loops between calculated measures ; maybe we allow to read only already defined measures (so the order of measure definition will become important) Once we have this -> I am pretty sure someone will quickly implement generic expression parser and evaluator and Kylin can be easily enhanced with calculated fields > When the calculation on measures is supported on Kylin? > --- > > Key: KYLIN-1886 > URL: https://issues.apache.org/jira/browse/KYLIN-1886 > Project: Kylin > Issue Type: Test >Reporter: Rahul Choubey > > Suppose we have two measures and we want to do some calculation on the top of > these measures instead of doing it on the fly in the sql query after the cube > got build? Currently it does not supported in Kylin and in which version we > are planning to have these feature? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1786) Frontend work for KYLIN-1313 (extended columns as measure)
[ https://issues.apache.org/jira/browse/KYLIN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375157#comment-15375157 ] Richard Calaba commented on KYLIN-1786: --- Great, is there a chance to generate patch for 1.5.2.1 so I can test it on latest released version ? > Frontend work for KYLIN-1313 (extended columns as measure) > -- > > Key: KYLIN-1786 > URL: https://issues.apache.org/jira/browse/KYLIN-1786 > Project: Kylin > Issue Type: Improvement > Components: Web >Reporter: Dong Li >Assignee: Zhong,Jason > Fix For: v1.5.3 > > Attachments: 屏幕快照 2016-06-15 12.22.54.png > > > KYLIN-1313 introduced a measure called extendedcolumn, but seems not enabled > on WebUI, see attached screenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372226#comment-15372226 ] Richard Calaba commented on KYLIN-1834: --- Thanx, enjoy your vacation :). If you have issues to reproduce the bug, let me know. > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > == > The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 > mio rows. I have increased the JVM -Xmx to 16gb and set the > kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube > build doesn't fail (previously we were getting exception complaining about > the 300MB limit for Dimension dictionary size (req. approx 700MB)). > == > Before that we were getting exception complaining about the Dictionary > encoding problem - "Too high cardinality is not suitable for dictionary -- > cardinality: 10873977" - this we resolved by changing the affected > dimension/row key Encoding from "dict" to "int; length=8" on the Advanced > Settings of the Cube. > == > We have 2 high-cardinality fields (one from fact table and
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370214#comment-15370214 ] Richard Calaba commented on KYLIN-1834: --- Ok, I have uploaded the 50MB csv (compressed to 15MB by 7z and 20mb by bzip2) file with customer id (bigint) to my drive here: 7z: https://drive.google.com/file/d/0Bz5GkHbD3o7Kd2JfUmZtYjNMX0k/view?usp=sharing bzip2: https://drive.google.com/file/d/0Bz5GkHbD3o7KRXZlYUdHWW85RlU/view?usp=sharing It should contain 13 645 863 bigints. I didn't check the count. Seems the last byte of the file is Hex 1A (EOF) - thus the last bigint in the CSV might not read correctly -> maybe filter it out. To make it working in Kylin I had to increase max. snapshot size (with all the fields it was over 700MB) - if you use only the Bigints - might be little less ... To allow processing of high-cardinality customer table in Kylin 1.5.2.1 I did this: 1) In conf/kylin.properties added kylin.table.snapshot.max_mb=2048 2) In bin/setnev.sh - set the KYLIN_JVM_PROPERTIES=-Xmx16g (original 4096m wasn't enought, 8g should do) count: 13645863 > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > == > The Cube is
[jira] [Comment Edited] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369901#comment-15369901 ] Richard Calaba edited comment on KYLIN-1834 at 7/10/16 7:08 PM: Customer ID -s the 1st field - BIGINT: -9223372036854775808< -2857007631392161431 < 9223372036854775807 BIGINT MINBIGINT MAX -9223372036854775808 -2857007631392161431 "-2857007631392161431" is perfrctly fine BigInt value ... In addition I was getting same exception reporting positive bigint number as invalid as well. was (Author: cal...@gmail.com): Customer ID -s the 1st field - BIGINT: -9223372036854775808< -2857007631392161431 < 9223372036854775807 BIGINT MINBIGINT MAX -9223372036854775808 -2857007631392161431 "-2857007631392161431" is perfrctly fine BigInt value ... > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > == > The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 > mio rows. I have increased the JVM -Xmx to 16gb and set the > kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube >
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369901#comment-15369901 ] Richard Calaba commented on KYLIN-1834: --- Customer ID -s the 1st field - BIGINT: -9223372036854775808< -2857007631392161431 < 9223372036854775807 BIGINT MINBIGINT MAX -9223372036854775808 -2857007631392161431 "-2857007631392161431" is perfrctly fine BigInt value ... > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > == > The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 > mio rows. I have increased the JVM -Xmx to 16gb and set the > kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube > build doesn't fail (previously we were getting exception complaining about > the 300MB limit for Dimension dictionary size (req. approx 700MB)). > == > Before that we were getting exception complaining about the Dictionary > encoding problem - "Too high cardinality is not suitable for dictionary -- > cardinality: 10873977" - this we resolved by changing the affected > dimension/row key Encoding
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369163#comment-15369163 ] Richard Calaba commented on KYLIN-1834: --- This is the record in lookup table which get's the Value not found exception in the log for value: -2857007631392161431 (customer_id - bigint) This is the record from the source lookup table - nothing special only lot of NULLs: Col-Types: BIGINT_TYPE, STRING_TYPE, INT_TYPE, INT_TYPE, INT_TYPE, INT_TYPE, INT_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, BIGINT_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, BIGINT_TYPE, STRING_TYPE Col-Values: -2857007631392161431 9526ea3e-1359-45db-b872-8c47a9df3e46-28570076313921614311 1 NULLNULLNULLNULLNULLNULLNULLJoe NULL xx...@yahoo.com Joe Joe Joe DoeDoeD NULLNULL80040 NULL NULLNULLNULLNULLNULLNULLNULLNULLNULL 2016-06-23 > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } >
[jira] [Commented] (KYLIN-1827) Send mail notification when runtime exception throws during build/merge cube
[ https://issues.apache.org/jira/browse/KYLIN-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368459#comment-15368459 ] Richard Calaba commented on KYLIN-1827: --- Ok, thanx, I have successfully tested this - but found one problem, the subject line of the email comes as: [ERROR] - [envName] - [projectName] - test_JAMBAJUICE_3_0_REL_TRX_POS_CHECK_W_TDC -- not sure where the envName should be filled -- also the projectName is not correctly resolved to my project name - otherwise the rest looks fine - would be cool to have a URL link to go to the UI / Step which failed (nice to have) :) So is there any settings for those notification templates ?? > Send mail notification when runtime exception throws during build/merge cube > > > Key: KYLIN-1827 > URL: https://issues.apache.org/jira/browse/KYLIN-1827 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.1, v1.5.2 >Reporter: Ma Gang >Assignee: Ma Gang > > Currently mail notification is only sent in the onExecuteFinished() method, > but no notification will be sent when RuntimeException throws, that may cause > user miss some important job build failure, especially for some automation > merge jobs. Sometimes job state update fails(the hbase metastore is > unavailable in a short time), it will make the job always look like in a > running state, but actually it is failed, should send mail to notify user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1844) Hold huge dictionary in 2nd storage like disk/hbase
[ https://issues.apache.org/jira/browse/KYLIN-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368321#comment-15368321 ] Richard Calaba commented on KYLIN-1844: --- Wasn't there in previous versions of Kylin an option to switch off the encoding completely ?? I believe I saw some old discussion on this topic ... Seems I have to encoude dimension always - and Dict is default. > Hold huge dictionary in 2nd storage like disk/hbase > --- > > Key: KYLIN-1844 > URL: https://issues.apache.org/jira/browse/KYLIN-1844 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v1.2, v1.5.2 >Reporter: Abhilash L L >Assignee: liyang > > A whole dimension is kept in memory. > We should have a way to keep only certain number / size of total rows to be > kept in memory. A LRU cache for rows in the dimension will help keep memory > in check. > Why not store all the dimensions data in hbase in a different table with a > prefix of dimensionid, and all calls to the dimensions (get based on dim > key), is mapped to hbase. > This does mean it will cost more time on a miss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)
[ https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368316#comment-15368316 ] Richard Calaba edited comment on KYLIN-1835 at 7/8/16 7:53 PM: --- [~liyang.g...@gmail.com]: You mean I have to map the BigInt ID to Int i.e. using view on top of the lookup ??? To achieve it I would have to sort all the values and assign to it a row number i.e. -> this way I know I have mapping without collisions ... any other faster /better way ?? And I would have to do the view on both fact and lookup. was (Author: cal...@gmail.com): [~liyang.g...@gmail.com]: You mean I have to map the BigInt ID to Int i.e. using view on top of the lookup ??? To achieve it I would have to sort all the values and assign to it a row number i.e. -> this way I know I have mapping without collisions ... any other faster /better way ?? > Error: java.lang.NumberFormatException: For input count_distinct on Big Int > ??? (#7 Step Name: Build Base Cuboid Data) > -- > > Key: KYLIN-1835 > URL: https://issues.apache.org/jira/browse/KYLIN-1835 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Minor > > I believe I have discovered an error in Kylin realted to count_distinc with > exact precission. > I am not 100% sure - but all points to the fact tha there is a design limit > for count_distinct ... please assess / confirm / reject my observation. > Background info: > = > - large fact table ~ 100 mio rows. > - large customer dimension ~ 10 mio rows > Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type > bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for > one measure max 15 000 000 distinct values ; 2nd measure can have more > distinct values ~ approx. 50 mil (just an estimate). > Error info: > > Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it > errors out without further details in Kylin Log - it shows only "no counters > for job job_1463699962519_16085". > The MR Logs of the job job_1463699962519_16085 sow exceptions: > 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.NumberFormatException: For input string: > "-6628245177096591402" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:495) > at java.lang.Integer.parseInt(Integer.java:527) > at > org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206) > at > org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Just reading the signature of the exception and connecting the Measure > precision return type "bitmap" => looks like that because I have chosen exact > precision (which on UI says supported for int types) is causing this > exception because I am passing Bigint field > If so -> is that a bug (refactory for big int needed) or is it design > limitation ??? Cannot be the count_distinct implemented for bigint (with > exact precision) or do I have to use count_distinct with error rate instead > ??? > In case I do not need to calculate the count_distinct for all dimensions > combinations - I might add some mandatory dimensions to the aggregation > group - but not sure if this would resolve this issue (assuming I keep the > exact precision counts) ... ??? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)
[ https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368316#comment-15368316 ] Richard Calaba commented on KYLIN-1835: --- [~liyang.g...@gmail.com]: You mean I have to map the BigInt ID to Int i.e. using view on top of the lookup ??? To achieve it I would have to sort all the values and assign to it a row number i.e. -> this way I know I have mapping without collisions ... any other faster /better way ?? > Error: java.lang.NumberFormatException: For input count_distinct on Big Int > ??? (#7 Step Name: Build Base Cuboid Data) > -- > > Key: KYLIN-1835 > URL: https://issues.apache.org/jira/browse/KYLIN-1835 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Minor > > I believe I have discovered an error in Kylin realted to count_distinc with > exact precission. > I am not 100% sure - but all points to the fact tha there is a design limit > for count_distinct ... please assess / confirm / reject my observation. > Background info: > = > - large fact table ~ 100 mio rows. > - large customer dimension ~ 10 mio rows > Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type > bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for > one measure max 15 000 000 distinct values ; 2nd measure can have more > distinct values ~ approx. 50 mil (just an estimate). > Error info: > > Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it > errors out without further details in Kylin Log - it shows only "no counters > for job job_1463699962519_16085". > The MR Logs of the job job_1463699962519_16085 sow exceptions: > 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.NumberFormatException: For input string: > "-6628245177096591402" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:495) > at java.lang.Integer.parseInt(Integer.java:527) > at > org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206) > at > org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Just reading the signature of the exception and connecting the Measure > precision return type "bitmap" => looks like that because I have chosen exact > precision (which on UI says supported for int types) is causing this > exception because I am passing Bigint field > If so -> is that a bug (refactory for big int needed) or is it design > limitation ??? Cannot be the count_distinct implemented for bigint (with > exact precision) or do I have to use count_distinct with error rate instead > ??? > In case I do not need to calculate the count_distinct for all dimensions > combinations - I might add some mandatory dimensions to the aggregation > group - but not sure if this would resolve this issue (assuming I keep the > exact precision counts) ... ??? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1863) Discard the Jobs while Droping/Purging the Cube
[ https://issues.apache.org/jira/browse/KYLIN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1863: -- Description: I have observed that following scenario on UI leaves uncleaned meta-data in Kylin: 1) I have an error status job in Monitor for my Cube. I drop the cube from UI. I still see the error status jobs in Monitor after Dropping the Cube. If I try to Discard the job -> I am getting NPE. Didn't test the same if Purge used instead of Drop - but this needs to be checked as well. 2) Not 100% sure - but I have a feeling that if I Drop cube from UI before Purging it 1st - some job execution metadata (finished build jobs) stay in the system ... (intermediate tables/HDFS folders/...). It is hard to find a prove now when my system is polluted with old job executions. This could be checked while working on 1) above. was: I have observed that following scenario on UI leaves uncleaned meta-data in Kylin: 1) I have a error status job in Monitor for my Cube. I drop the cube from UI. I still see the error status jobs in Monitor after Dropping the Cube. If I try to Discard the job -> I am getting NPE. Didn't test the same if Purge used instead of Drop - but this needs to be checked as well. 2) Not 100% sure - but I have a feeling that if I Drop cube from UI before Purging it 1st - some job execution metadata (finished build jobs) stay in the system ... (intermediate tables/HDFS folders/...). It is hard to find a prove now when my system is polluted with old job executions. This could be checked while working on 1) above. > Discard the Jobs while Droping/Purging the Cube > --- > > Key: KYLIN-1863 > URL: https://issues.apache.org/jira/browse/KYLIN-1863 > Project: Kylin > Issue Type: Bug >Reporter: Richard Calaba > Fix For: all, v1.5.2, v1.5.2.1 > > > I have observed that following scenario on UI leaves uncleaned meta-data in > Kylin: > 1) I have an error status job in Monitor for my Cube. I drop the cube from > UI. I still see the error status jobs in Monitor after Dropping the Cube. If > I try to Discard the job -> I am getting NPE. Didn't test the same if Purge > used instead of Drop - but this needs to be checked as well. > 2) Not 100% sure - but I have a feeling that if I Drop cube from UI before > Purging it 1st - some job execution metadata (finished build jobs) stay in > the system ... (intermediate tables/HDFS folders/...). It is hard to find a > prove now when my system is polluted with old job executions. This could be > checked while working on 1) above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1863) Discard the Jobs while Droping/Purging the Cube
Richard Calaba created KYLIN-1863: - Summary: Discard the Jobs while Droping/Purging the Cube Key: KYLIN-1863 URL: https://issues.apache.org/jira/browse/KYLIN-1863 Project: Kylin Issue Type: Bug Reporter: Richard Calaba Fix For: all, v1.5.2.1, v1.5.2 I have observed that following scenario on UI leaves uncleaned meta-data in Kylin: 1) I have a error status job in Monitor for my Cube. I drop the cube from UI. I still see the error status jobs in Monitor after Dropping the Cube. If I try to Discard the job -> I am getting NPE. Didn't test the same if Purge used instead of Drop - but this needs to be checked as well. 2) Not 100% sure - but I have a feeling that if I Drop cube from UI before Purging it 1st - some job execution metadata (finished build jobs) stay in the system ... (intermediate tables/HDFS folders/...). It is hard to find a prove now when my system is polluted with old job executions. This could be checked while working on 1) above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367908#comment-15367908 ] Richard Calaba commented on KYLIN-1834: --- Hello liyang, thaak you for the hints. I do not think the snapshot was modified and if then that has to be another bug in Kylin. We are not loading any data to those tables anymore. I have tested the same scenario several times on several cubes (sharing the model). I was running into same issue when using both views and tables for fact and dimension/lookup tables. I was also running into same issue while builidng only one cube at a time or building several cubes at a time (based on same model). I was several times not only restarting the failed step but running the cube build again from scratch. Do I assume correctly that the Value not found ! exception si related only to the lookup table content ??? So values in fact table (for that dimension) are not involved, right ?? If so -> then I can be pretty sure the lookup didn't change and the error is still coming. So overall I am 99.9% positive that this (added value to the snapshot table of the lookup) is not the cause (unless Kylin itself has some inconsistency issues and modifies/doesn't persist the whole the snapshot). > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new
[jira] [Commented] (KYLIN-1642) UI Refresh needed after Purging the Cube before Build ...
[ https://issues.apache.org/jira/browse/KYLIN-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367843#comment-15367843 ] Richard Calaba commented on KYLIN-1642: --- As the https://issues.apache.org/jira/browse/KYLIN-1647 is closed I beleive this is also resolved. > UI Refresh needed after Purging the Cube before Build ... > -- > > Key: KYLIN-1642 > URL: https://issues.apache.org/jira/browse/KYLIN-1642 > Project: Kylin > Issue Type: Bug > Components: General, Web >Affects Versions: all, v1.5.1 >Reporter: Richard Calaba >Priority: Trivial > > Hello, minor bug on Web UI of Kylin discovered: > - After calling Purge on Cube and then trying immediately Build the cube > again (and selecting a Today as the time dimension selection for new segment) > I was getting an error that the selected date (of the new segment) should be > bigger than the date of last loaded segment - which was already Purged ... > - I had to refresh the Web UI in order to be able to schedule the Build of > empty cube > Seems the Purge need to refresh the WebUI metadata about the loaded segments > ... > Observed on version 1.5.1 but assuming all affected ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1830) Put KYLIN_JVM_SETTINGS to kylin.properties
[ https://issues.apache.org/jira/browse/KYLIN-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367146#comment-15367146 ] Richard Calaba commented on KYLIN-1830: --- Fair enough - I had no idea so I did mine property parsing to support commented parameters , paramteres in quotes , spliting only after 1st '=' sign, atd ... if all this is supported by the bin/get-properties.sh then fine -> it will be even faster as mine code scans the property every time the export_property_override is used instead of export parameter. It was designed this way to make the old setenv.sh and the new one very similiar (export_property_override instead of export =. Also mine implementation supports specifying default value in the script so it doesn't have to be overriden in kylin.properties file. > Put KYLIN_JVM_SETTINGS to kylin.properties > -- > > Key: KYLIN-1830 > URL: https://issues.apache.org/jira/browse/KYLIN-1830 > Project: Kylin > Issue Type: Improvement >Reporter: Richard Calaba >Priority: Minor > Labels: newbie > Attachments: kylin.properties, setenv.sh > > > Currently is the KYLIN_JVM_SETTINGS variable stored in the ,/bin/setenv.sh > ... which is not wrong, but as we have also some other memory specific > setting in ./conf/kylin.properties file (like i.e > kylin.job.mapreduce.default.reduce.input.mb or kylin.table.snapshot.max_mb) > it might be good idea to have those performance and sizing related parameters > in one location. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1857) Show available memory on UI - in System Tab (and other runtime statistics)
Richard Calaba created KYLIN-1857: - Summary: Show available memory on UI - in System Tab (and other runtime statistics) Key: KYLIN-1857 URL: https://issues.apache.org/jira/browse/KYLIN-1857 Project: Kylin Issue Type: Improvement Affects Versions: v1.5.2, v1.5.2.1 Reporter: Richard Calaba Priority: Minor I have run into situation that Kylin dies (exception in log says heap out of memory) if I try to run 3 parallel cubes with high-cardinality dimensions. It is reproduceable scenario. I have set max snapshot size to 2GB and -Xmx to 16GB. If I run the cube build one-by-one -> Kylin doesn't die. As we have have no idea about memory requirements before we start building the cube(s) then for now it would be beneficial at least to monitor basic Kylin VM statistics, i.e.: -- current memory occupied by snaphots -- total memory allocation & total free memory -- how many (and which) temporary (intermediate) objects (in hive/hbase/filesystem) are created ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1856) Kylin shows old error in job step output after resume - specifically in #4 Step Name: Build Dimension Dictionary
Richard Calaba created KYLIN-1856: - Summary: Kylin shows old error in job step output after resume - specifically in #4 Step Name: Build Dimension Dictionary Key: KYLIN-1856 URL: https://issues.apache.org/jira/browse/KYLIN-1856 Project: Kylin Issue Type: Bug Affects Versions: v1.5.2, v1.5.2.1 Reporter: Richard Calaba Priority: Minor I have realized that if my job stops with error and I try to recover the error and resume the job - then the latest step starts again from scratch. This is fine by in my opinion the log of the Step should clear as well - now it is showing the error from my previous attempt. Specifically observed in #4 Step Name: Build Dimension Dictionary - but is probbaly generic issue. Ask is: clear the log of the Build Step if job Step is resumed. Already when the job step is restarted, not after it is completed. (if Kylin fails i.e. for out of memmory - it silently dies and analyzing the step log shows wrong error (from previous run) - if it would be empty -> I would know that most probable cause was that Kylin died) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1856) Kylin shows old error in job step output after resume - specifically in #4 Step Name: Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1856: -- Description: I have realized that if my job stops with error and I try to recover the error and resume the job - then the latest step starts again from scratch. This is fine but in my opinion the log of the Step should clear as well - now it is showing the error from my previous attempt. Specifically observed in #4 Step Name: Build Dimension Dictionary - but is probbaly generic issue. To correct this: clear the log of the Build Step after the job Step is resumed. Already when the job step is restarted, not after it is completed. (if Kylin fails i.e. for out of memory - it silently dies and analyzing the step log shows wrong error (from previous run) - if it would be empty -> I would know that most probable cause was that Kylin died) was: I have realized that if my job stops with error and I try to recover the error and resume the job - then the latest step starts again from scratch. This is fine but in my opinion the log of the Step should clear as well - now it is showing the error from my previous attempt. Specifically observed in #4 Step Name: Build Dimension Dictionary - but is probbaly generic issue. To correct this: clear the log of the Build Step after the job Step is resumed. Already when the job step is restarted, not after it is completed. (if Kylin fails i.e. for out of memmory - it silently dies and analyzing the step log shows wrong error (from previous run) - if it would be empty -> I would know that most probable cause was that Kylin died) > Kylin shows old error in job step output after resume - specifically in #4 > Step Name: Build Dimension Dictionary > > > Key: KYLIN-1856 > URL: https://issues.apache.org/jira/browse/KYLIN-1856 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Minor > > I have realized that if my job stops with error and I try to recover the > error and resume the job - then the latest step starts again from scratch. > This is fine but in my opinion the log of the Step should clear as well - now > it is showing the error from my previous attempt. > Specifically observed in #4 Step Name: Build Dimension Dictionary - but is > probbaly generic issue. > To correct this: clear the log of the Build Step after the job Step is > resumed. Already when the job step is restarted, not after it is completed. > (if Kylin fails i.e. for out of memory - it silently dies and analyzing the > step log shows wrong error (from previous run) - if it would be empty -> I > would know that most probable cause was that Kylin died) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1856) Kylin shows old error in job step output after resume - specifically in #4 Step Name: Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1856: -- Description: I have realized that if my job stops with error and I try to recover the error and resume the job - then the latest step starts again from scratch. This is fine but in my opinion the log of the Step should clear as well - now it is showing the error from my previous attempt. Specifically observed in #4 Step Name: Build Dimension Dictionary - but is probbaly generic issue. Ask is: clear the log of the Build Step if job Step is resumed. Already when the job step is restarted, not after it is completed. (if Kylin fails i.e. for out of memmory - it silently dies and analyzing the step log shows wrong error (from previous run) - if it would be empty -> I would know that most probable cause was that Kylin died) was: I have realized that if my job stops with error and I try to recover the error and resume the job - then the latest step starts again from scratch. This is fine by in my opinion the log of the Step should clear as well - now it is showing the error from my previous attempt. Specifically observed in #4 Step Name: Build Dimension Dictionary - but is probbaly generic issue. Ask is: clear the log of the Build Step if job Step is resumed. Already when the job step is restarted, not after it is completed. (if Kylin fails i.e. for out of memmory - it silently dies and analyzing the step log shows wrong error (from previous run) - if it would be empty -> I would know that most probable cause was that Kylin died) > Kylin shows old error in job step output after resume - specifically in #4 > Step Name: Build Dimension Dictionary > > > Key: KYLIN-1856 > URL: https://issues.apache.org/jira/browse/KYLIN-1856 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Minor > > I have realized that if my job stops with error and I try to recover the > error and resume the job - then the latest step starts again from scratch. > This is fine but in my opinion the log of the Step should clear as well - now > it is showing the error from my previous attempt. > Specifically observed in #4 Step Name: Build Dimension Dictionary - but is > probbaly generic issue. > Ask is: clear the log of the Build Step if job Step is resumed. Already when > the job step is restarted, not after it is completed. > (if Kylin fails i.e. for out of memmory - it silently dies and analyzing the > step log shows wrong error (from previous run) - if it would be empty -> I > would know that most probable cause was that Kylin died) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365366#comment-15365366 ] Richard Calaba edited comment on KYLIN-1834 at 7/7/16 12:14 AM: Adding further logger statements I found out that the TrieEncoding fails in the method lookupSeqNoFromValue in this laste else statement "else { // children are ordered by their first value byte" in the while loop: while (true) { p = c + firstByteOffset; comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte); if (comp == 0) { // continue in the matching child, reset n and loop again n = c; o++; break; } else if (comp < 0) { // try next child seq += BytesUtil.readUnsigned(trieBytes, c + sizeChildOffset, sizeNoValuesBeneath); if (checkFlag(c, BIT_IS_LAST_CHILD)) return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child can match the next byte of input c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1); } else { // children are ordered by their first value byte THIS CODE IS CAUSING RETURN -1 return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child can match the next byte of input } } private int roundSeqNo(int roundingFlag, int i, int j, int k) { if (roundingFlag == 0) return j; else if (roundingFlag < 0) return i; else return k; } The roundingFlag is set to 0 ; we are using Int encoding with length = 8 for the affected Dimension the dimension ID being reported in the Value not found is of type Bigint. That's pretty much all I can figure out ... so now question to the guru who wrote the TrieDictionary encoding logic ... why is this code failing here ??? BTW: Sorry for the code formatting - JIRA really sucks in this was (Author: cal...@gmail.com): Adding further logger statements I found out that the TrieEncoding fails in the method lookupSeqNoFromValue in this laste else statement "else { // children are ordered by their first value byte" in the while loop: while (true) { p = c + firstByteOffset; comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte); if (comp == 0) { // continue in the matching child, reset n and loop again n = c; o++; break; } else if (comp < 0) { // try next child seq += BytesUtil.readUnsigned(trieBytes, c + sizeChildOffset, sizeNoValuesBeneath); if (checkFlag(c, BIT_IS_LAST_CHILD)) return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child can match the next byte of input c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1); } else { // children are ordered by their first value byte THIS CODE IS CAUSING RETURN -1 return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child can match the next byte of input } } private int roundSeqNo(int roundingFlag, int i, int j, int k) { if (roundingFlag == 0) return j; else if (roundingFlag < 0) return i; else return k; } The roundingFlag is set to 0 ; we are using Int encoding with length = 8 for the affected Dimension the dimension ID being reported in the Value not found is of type Bigint. That's pretty much all I can figure out ... so now question to the guru whe wrote the TrieDictionary encoding logic ... why is this code failing here ??? > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at
[jira] [Comment Edited] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365366#comment-15365366 ] Richard Calaba edited comment on KYLIN-1834 at 7/7/16 12:12 AM: Adding further logger statements I found out that the TrieEncoding fails in the method lookupSeqNoFromValue in this laste else statement "else { // children are ordered by their first value byte" in the while loop: while (true) { p = c + firstByteOffset; comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte); if (comp == 0) { // continue in the matching child, reset n and loop again n = c; o++; break; } else if (comp < 0) { // try next child seq += BytesUtil.readUnsigned(trieBytes, c + sizeChildOffset, sizeNoValuesBeneath); if (checkFlag(c, BIT_IS_LAST_CHILD)) return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child can match the next byte of input c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1); } else { // children are ordered by their first value byte THIS CODE IS CAUSING RETURN -1 return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child can match the next byte of input } } private int roundSeqNo(int roundingFlag, int i, int j, int k) { if (roundingFlag == 0) return j; else if (roundingFlag < 0) return i; else return k; } The roundingFlag is set to 0 ; we are using Int encoding with length = 8 for the affected Dimension the dimension ID being reported in the Value not found is of type Bigint. That's pretty much all I can figure out ... so now question to the guru whe wrote the TrieDictionary encoding logic ... why is this code failing here ??? was (Author: cal...@gmail.com): Adding further logger statements I found out that the TrieEncoding fails in the method lookupSeqNoFromValue in this laste else statement "else { // children are ordered by their first value byte" in the while loop: while (true) { p = c + firstByteOffset; comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte); if (comp == 0) { // continue in the matching child, reset n and loop again n = c; o++; break; } else if (comp < 0) { // try next child seq += BytesUtil.readUnsigned(trieBytes, c + sizeChildOffset, sizeNoValuesBeneath); if (checkFlag(c, BIT_IS_LAST_CHILD)) return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child can match the next byte of input c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1); } else { // children are ordered by their first value byte THIS CODE IS CAUSING RETURN -1 return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child can match the next byte of input } } private int roundSeqNo(int roundingFlag, int i, int j, int k) { if (roundingFlag == 0) return j; else if (roundingFlag < 0) return i; else return k; } The roundingFlag is set to 0 ; we are using Int encoding with length = 8 for the affected Dimension the dimension ID being reported in the Value not found is of type Bigint. That's pretty much all I can figure out ... so now question to the guru whe wrote the TrieDictionary encoding logic ... why is this code failing here ??? > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) >
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365366#comment-15365366 ] Richard Calaba commented on KYLIN-1834: --- Adding further logger statements I found out that the TrieEncoding fails in the method lookupSeqNoFromValue in this laste else statement "else { // children are ordered by their first value byte" in the while loop: while (true) { p = c + firstByteOffset; comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte); if (comp == 0) { // continue in the matching child, reset n and loop again n = c; o++; break; } else if (comp < 0) { // try next child seq += BytesUtil.readUnsigned(trieBytes, c + sizeChildOffset, sizeNoValuesBeneath); if (checkFlag(c, BIT_IS_LAST_CHILD)) return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child can match the next byte of input c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1); } else { // children are ordered by their first value byte THIS CODE IS CAUSING RETURN -1 return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child can match the next byte of input } } private int roundSeqNo(int roundingFlag, int i, int j, int k) { if (roundingFlag == 0) return j; else if (roundingFlag < 0) return i; else return k; } The roundingFlag is set to 0 ; we are using Int encoding with length = 8 for the affected Dimension the dimension ID being reported in the Value not found is of type Bigint. That's pretty much all I can figure out ... so now question to the guru whe wrote the TrieDictionary encoding logic ... why is this code failing here ??? > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache
[jira] [Comment Edited] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365245#comment-15365245 ] Richard Calaba edited comment on KYLIN-1834 at 7/6/16 10:53 PM: To further debug the issue I have modified TrieDictionary.java to add additional log info to method getIdFromValueBytesImpl: @Override protected int getIdFromValueBytesImpl(byte[] value, int offset, int len, int roundingFlag) { int seq = lookupSeqNoFromValue(headSize, value, offset, offset + len, roundingFlag); int id = calcIdFromSeqNo(seq); if (id < 0) { logger.error("Not a valid value: " + bytesConvert.convertFromBytes(value, offset, len)); logger.error("Seq (="+seq+") returned by lookupSeqNoFromValue (headSize="+headSize+", value="+value+", offset="+offset+", len="+len+", roundingFlag="+roundingFlag); logger.error("Id (="+id+") returned by calcIdFromSeqNo(seq) with nValues="+nValues+", baseId="+baseId); } return id; } Now I see this in kylin log: 2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:174 : Not a valid value: -2857007631392161431 2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:175 : Seq (=-1) returned by lookupSeqNoFromValue (headSize=64, value=[B@12647ae0, offset=0, len=20, roundingFlag=0 2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:176 : Id (=-1) returned by calcIdFromSeqNo(seq) with nValues=44703717, baseId=0 2016-07-06 16:57:16,917 ERROR [pool-2-thread-7] execution.AbstractExecutable:62 : error execute HadoopShellExecutable{id=21521c0a-c06f-4ee9-b682-2c468bfaf526-03, name=Build Dimension Dictionary, state=RUNNING} java.lang.IllegalArgumentException: Value not exists! at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160 So definitely the method lookupSeqNoFromValue fails while trying to encode the value: nValues= 44703717 - ??? not sure where this number comes from - # of distinct ids (customer_id) in the fact table is 10 873 977 - # of distinct ids (customer_id) in the lookup table is 13 645 863 - # of distinct IDs (transaction_id) - another high cardinality dimension withouth lookup table - is 115 732 839 - # of distinct combinations in fact table of date / customer_id (2nd lookup table in the model using the high card. dimension) is 31 663 787 So no idea where the nValues= 44703717 comes from ... Method lookupSeqNoFromValue source: private int lookupSeqNoFromValue(int n, byte[] inp, int o, int inpEnd, int roundingFlag) { if (o == inpEnd) // special 'empty' value return checkFlag(headSize, BIT_IS_END_OF_VALUE) ? 0 : roundSeqNo(roundingFlag, -1, -1, 0); int seq = 0; // the sequence no under track while (true) { // match the current node, note [0] of node's value has been matched // when this node is selected by its parent int p = n + firstByteOffset; // start of node's value int end = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1); // end of node's value for (p++; p < end && o < inpEnd; p++, o++) { // note matching start from [1] if (trieBytes[p] != inp[o]) { int comp = BytesUtil.compareByteUnsigned(trieBytes[p], inp[o]); if (comp < 0) { seq += BytesUtil.readUnsigned(trieBytes, n + sizeChildOffset, sizeNoValuesBeneath); } return roundSeqNo(roundingFlag, seq - 1, -1, seq); // mismatch } } // node completely matched, is input all consumed? boolean isEndOfValue = checkFlag(n, BIT_IS_END_OF_VALUE); if (o == inpEnd) { return p == end && isEndOfValue ? seq : roundSeqNo(roundingFlag, seq - 1, -1, seq); // input all matched } if (isEndOfValue) seq++; // find a child to continue int c = headSize + (BytesUtil.readUnsigned(trieBytes, n, sizeChildOffset) & childOffsetMask); if (c == headSize) // has no children return roundSeqNo(roundingFlag, seq - 1, -1, seq); // input only partially matched byte inpByte = inp[o]; int comp; while (true) { p = c + firstByteOffset; comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte); if (comp == 0) { // continue in the matching child, reset n and loop again n = c; o++; break; } else if (comp < 0) { // try next child seq +=
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365245#comment-15365245 ] Richard Calaba commented on KYLIN-1834: --- To further debug the issue I have modified TrieDictionary.java to add additional log info to method getIdFromValueBytesImpl: @Override protected int getIdFromValueBytesImpl(byte[] value, int offset, int len, int roundingFlag) { int seq = lookupSeqNoFromValue(headSize, value, offset, offset + len, roundingFlag); int id = calcIdFromSeqNo(seq); if (id < 0) { logger.error("Not a valid value: " + bytesConvert.convertFromBytes(value, offset, len)); logger.error("Seq (="+seq+") returned by lookupSeqNoFromValue (headSize="+headSize+", value="+value+", offset="+offset+", len="+len+", roundingFlag="+roundingFlag); logger.error("Id (="+id+") returned by calcIdFromSeqNo(seq) with nValues="+nValues+", baseId="+baseId); } return id; } Now I see this in kylin log: 2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:174 : Not a valid value: -2857007631392161431 2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:175 : Seq (=-1) returned by lookupSeqNoFromValue (headSize=64, value=[B@12647ae0, offset=0, len=20, roundingFlag=0 2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:176 : Id (=-1) returned by calcIdFromSeqNo(seq) with nValues=44703717, baseId=0 2016-07-06 16:57:16,917 ERROR [pool-2-thread-7] execution.AbstractExecutable:62 : error execute HadoopShellExecutable{id=21521c0a-c06f-4ee9-b682-2c468bfaf526-03, name=Build Dimension Dictionary, state=RUNNING} java.lang.IllegalArgumentException: Value not exists! at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160 So definitely the method lookupSeqNoFromValue fails while trying to encode the value: nValues= 44703717 - not sure where this number comes from - # of distinct ids in the dimenson is approx. 13 mio Method lookupSeqNoFromValue source: private int lookupSeqNoFromValue(int n, byte[] inp, int o, int inpEnd, int roundingFlag) { if (o == inpEnd) // special 'empty' value return checkFlag(headSize, BIT_IS_END_OF_VALUE) ? 0 : roundSeqNo(roundingFlag, -1, -1, 0); int seq = 0; // the sequence no under track while (true) { // match the current node, note [0] of node's value has been matched // when this node is selected by its parent int p = n + firstByteOffset; // start of node's value int end = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1); // end of node's value for (p++; p < end && o < inpEnd; p++, o++) { // note matching start from [1] if (trieBytes[p] != inp[o]) { int comp = BytesUtil.compareByteUnsigned(trieBytes[p], inp[o]); if (comp < 0) { seq += BytesUtil.readUnsigned(trieBytes, n + sizeChildOffset, sizeNoValuesBeneath); } return roundSeqNo(roundingFlag, seq - 1, -1, seq); // mismatch } } // node completely matched, is input all consumed? boolean isEndOfValue = checkFlag(n, BIT_IS_END_OF_VALUE); if (o == inpEnd) { return p == end && isEndOfValue ? seq : roundSeqNo(roundingFlag, seq - 1, -1, seq); // input all matched } if (isEndOfValue) seq++; // find a child to continue int c = headSize + (BytesUtil.readUnsigned(trieBytes, n, sizeChildOffset) & childOffsetMask); if (c == headSize) // has no children return roundSeqNo(roundingFlag, seq - 1, -1, seq); // input only partially matched byte inpByte = inp[o]; int comp; while (true) { p = c + firstByteOffset; comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte); if (comp == 0) { // continue in the matching child, reset n and loop again n = c; o++; break; } else if (comp < 0) { // try next child seq += BytesUtil.readUnsigned(trieBytes, c + sizeChildOffset, sizeNoValuesBeneath); if (checkFlag(c, BIT_IS_LAST_CHILD)) return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child can match the next byte of input c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1); } else { // children are ordered by their first value byte return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no child
[jira] [Comment Edited] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363777#comment-15363777 ] Richard Calaba edited comment on KYLIN-1834 at 7/6/16 7:26 AM: --- Addition: so even dictionary encoding type Int, length 25 - same Value not found exception reached. The dimension has 10 mio. distinct IDs. I didn't find any way in Kylin 1.5.2.1 to process such dimension. Seems Kylin doesn't support high cardinality in current design. Analysis of the code - the error is raised in TrieDictionary.java - line 172 [if (id < 0)] is not fulfilled ... cause most probably in method lookupSeqNoFromValue @Override protected int getIdFromValueBytesImpl(byte[] value, int offset, int len, int roundingFlag) { int seq = lookupSeqNoFromValue(headSize, value, offset, offset + len, roundingFlag); int id = calcIdFromSeqNo(seq); if (id < 0) logger.error("Not a valid value: " + bytesConvert.convertFromBytes(value, offset, len)); return id; } was (Author: cal...@gmail.com): Addition: so even dictionary encoding type Int, length 25 - same Value not found exception reached. The dimension has 10 mio. distinct IDs. I didn't find any way in Kylin 1.5.2.1 to process such dimension. Seems Kylin doesn't support high cardinality in current design. > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, >
[jira] [Comment Edited] (KYLIN-1827) Send mail notification when runtime exception throws during build/merge cube
[ https://issues.apache.org/jira/browse/KYLIN-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363811#comment-15363811 ] Richard Calaba edited comment on KYLIN-1827 at 7/6/16 5:40 AM: --- Hello, I wonder where is configuration of the SMTP server to be used while sending the mail notifications ... ??? I didin't find any ... and also didin't find andy document describing how to configure Notifications in Kylin ... Now searching little more - I found JIRA https://issues.apache.org/jira/browse/KYLIN-672 and there is: kylin.properties If true, will send email notification; mail.enabled=false mail.host= mail.username= mail.password= mail.sender= So looks like this would be the settings ... but the branch is for 2.x so not sure if the notifications configuration is working in 1.5.x ... anyone can confirm ? was (Author: cal...@gmail.com): Hello, I wonder where is configuration of the SMTP server to be used while sending the mail notifications ... ??? I didin't find any ... and also didin't find andy document describing how to configure Notifications in Kylin ... > Send mail notification when runtime exception throws during build/merge cube > > > Key: KYLIN-1827 > URL: https://issues.apache.org/jira/browse/KYLIN-1827 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.1, v1.5.2 >Reporter: Ma Gang >Assignee: Ma Gang > > Currently mail notification is only sent in the onExecuteFinished() method, > but no notification will be sent when RuntimeException throws, that may cause > user miss some important job build failure, especially for some automation > merge jobs. Sometimes job state update fails(the hbase metastore is > unavailable in a short time), it will make the job always look like in a > running state, but actually it is failed, should send mail to notify user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1827) Send mail notification when runtime exception throws during build/merge cube
[ https://issues.apache.org/jira/browse/KYLIN-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363811#comment-15363811 ] Richard Calaba commented on KYLIN-1827: --- Hello, I wonder where is configuration of the SMTP server to be used while sending the mail notifications ... ??? I didin't find any ... and also didin't find andy document describing how to configure Notifications in Kylin ... > Send mail notification when runtime exception throws during build/merge cube > > > Key: KYLIN-1827 > URL: https://issues.apache.org/jira/browse/KYLIN-1827 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.1, v1.5.2 >Reporter: Ma Gang >Assignee: Ma Gang > > Currently mail notification is only sent in the onExecuteFinished() method, > but no notification will be sent when RuntimeException throws, that may cause > user miss some important job build failure, especially for some automation > merge jobs. Sometimes job state update fails(the hbase metastore is > unavailable in a short time), it will make the job always look like in a > running state, but actually it is failed, should send mail to notify user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1794) Enable job list even some job metadata parsing failed
[ https://issues.apache.org/jira/browse/KYLIN-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363803#comment-15363803 ] Richard Calaba edited comment on KYLIN-1794 at 7/6/16 5:32 AM: --- Hello, I have observed the same problem. Do not have stack trace but this is what had happened: - I had some finished and some ERR jobs in my 1.5.2.1 Kylin - I compiled and installed latest snapshot of Kylin-1.5.3-SNAPSHOT over the 1.5.2.1 ... After that - I was getting an error on UI while visiting the "Monitor" ... and the Monitor section was empty (even previously there were jobs listed) The error didn't go away even when I downgraded back to 1.5.2.1 ... so assuming some incompatibility in metadata between 1.5.2.1 and 1.5.3-SNAPSHOT ... I was having simiiar issues when testing earlier releases and switcheing between versions ... The only recovery option is to do full cleanup of metadata repository. Hope this helps to find this bug. I believe that the original reporter is simply asking for solution to catch exception while loading monitoring UI and if any thrown -> log it and contine with loading next jobs ... was (Author: cal...@gmail.com): Hello, I have observed the same problem. Do not have stack trace but what had hapened: - I had some fiunsihed and some ERR jobs in my 1.5.2.1 Kylin - over this I compiled and installed latest snapshot of Kylin-1.5.3-SNAPSHOT ... After that - I was getting an error on UI while visiting the "Monitor" ... the error didn't go away even when I downgraded back to 1.5.2.1 ... so assuming some incompatibility in metadata between 1.5.2.1 and 1.5.3-SNAPSHOT ... I was having simiiar issues when testing earlier releases and switcheing between versions ... The only recovery option is to do full cleanup of metadata repository. Hope this helps to find this bug. I believe that the original reporter is simply asking for solution to catch exception while loading monitoring UI and if any thrown -> log it and contine with loading next jobs ... > Enable job list even some job metadata parsing failed > - > > Key: KYLIN-1794 > URL: https://issues.apache.org/jira/browse/KYLIN-1794 > Project: Kylin > Issue Type: Bug > Components: Metadata >Reporter: Roger Shi >Assignee: Shaofeng SHI >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1794) Enable job list even some job metadata parsing failed
[ https://issues.apache.org/jira/browse/KYLIN-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363803#comment-15363803 ] Richard Calaba commented on KYLIN-1794: --- Hello, I have observed the same problem. Do not have stack trace but what had hapened: - I had some fiunsihed and some ERR jobs in my 1.5.2.1 Kylin - over this I compiled and installed latest snapshot of Kylin-1.5.3-SNAPSHOT ... After that - I was getting an error on UI while visiting the "Monitor" ... the error didn't go away even when I downgraded back to 1.5.2.1 ... so assuming some incompatibility in metadata between 1.5.2.1 and 1.5.3-SNAPSHOT ... I was having simiiar issues when testing earlier releases and switcheing between versions ... The only recovery option is to do full cleanup of metadata repository. Hope this helps to find this bug. I believe that the original reporter is simply asking for solution to catch exception while loading monitoring UI and if any thrown -> log it and contine with loading next jobs ... > Enable job list even some job metadata parsing failed > - > > Key: KYLIN-1794 > URL: https://issues.apache.org/jira/browse/KYLIN-1794 > Project: Kylin > Issue Type: Bug > Components: Metadata >Reporter: Roger Shi >Assignee: Shaofeng SHI >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363777#comment-15363777 ] Richard Calaba commented on KYLIN-1834: --- Addition: so even dictionary encoding type Int, length 25 - same Value not found exception reached. The dimension has 10 mio. distinct IDs. I didn't find any way in Kylin 1.5.2.1 to process such dimension. Seems Kylin doesn't support high cardinality in current design. > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > == > The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 > mio rows. I have increased the JVM -Xmx to 16gb and set the > kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube > build doesn't fail (previously we were getting exception complaining about > the 300MB limit for Dimension dictionary size (req. approx 700MB)). > == > Before that we were getting exception complaining about the Dictionary > encoding problem - "Too high cardinality is not suitable for dictionary -- > cardinality: 10873977" - this we resolved by changing the affected > dimension/row key Encoding from
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363767#comment-15363767 ] Richard Calaba commented on KYLIN-1834: --- I did try everything possible. The only thing I didn't try was to switch off dictionary encoding completely for this dimension ... is it possible ??? How? I tried: - fixed length encoding with length 20 - I tried Int encoding with max. size 8 - even with size 10 (even the UI complains that 8 is max) Always got same exeception -> Value not found! Now trying final test - Int with lentgh 25 - not sure what the lentgth relates to - whether # of bytes or # of decimals ... - so trying length 25 > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > == > The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 > mio rows. I have increased the JVM -Xmx to 16gb and set the > kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube > build doesn't fail (previously we were getting exception complaining about > the 300MB limit for Dimension dictionary size (req. approx 700MB)). > == > Before that we were
[jira] [Commented] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)
[ https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363743#comment-15363743 ] Richard Calaba commented on KYLIN-1835: --- What if I use Bigint ID but I know that I have less than Integer.MAX_VALUE distinct values in my dimension ??? Any chance the code can be adjusted to support this ?? > Error: java.lang.NumberFormatException: For input count_distinct on Big Int > ??? (#7 Step Name: Build Base Cuboid Data) > -- > > Key: KYLIN-1835 > URL: https://issues.apache.org/jira/browse/KYLIN-1835 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Minor > > I believe I have discovered an error in Kylin realted to count_distinc with > exact precission. > I am not 100% sure - but all points to the fact tha there is a design limit > for count_distinct ... please assess / confirm / reject my observation. > Background info: > = > - large fact table ~ 100 mio rows. > - large customer dimension ~ 10 mio rows > Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type > bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for > one measure max 15 000 000 distinct values ; 2nd measure can have more > distinct values ~ approx. 50 mil (just an estimate). > Error info: > > Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it > errors out without further details in Kylin Log - it shows only "no counters > for job job_1463699962519_16085". > The MR Logs of the job job_1463699962519_16085 sow exceptions: > 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.NumberFormatException: For input string: > "-6628245177096591402" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:495) > at java.lang.Integer.parseInt(Integer.java:527) > at > org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206) > at > org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Just reading the signature of the exception and connecting the Measure > precision return type "bitmap" => looks like that because I have chosen exact > precision (which on UI says supported for int types) is causing this > exception because I am passing Bigint field > If so -> is that a bug (refactory for big int needed) or is it design > limitation ??? Cannot be the count_distinct implemented for bigint (with > exact precision) or do I have to use count_distinct with error rate instead > ??? > In case I do not need to calculate the count_distinct for all dimensions > combinations - I might add some mandatory dimensions to the aggregation > group - but not sure if this would resolve this issue (assuming I keep the > exact precision counts) ... ??? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1844) High cardinality dimensions in memory
[ https://issues.apache.org/jira/browse/KYLIN-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363738#comment-15363738 ] Richard Calaba commented on KYLIN-1844: --- How can we switch-off dictionary for high cardinality dimensions ??? > High cardinality dimensions in memory > - > > Key: KYLIN-1844 > URL: https://issues.apache.org/jira/browse/KYLIN-1844 > Project: Kylin > Issue Type: Improvement > Components: Query Engine >Affects Versions: v1.2, v1.5.2 >Reporter: Abhilash L L >Assignee: liyang > > A whole dimension is kept in memory. > We should have a way to keep only certain number / size of total rows to be > kept in memory. A LRU cache for rows in the dimension will help keep memory > in check. > Why not store all the dimensions data in hbase in a different table with a > prefix of dimensionid, and all calls to the dimensions (get based on dim > key), is mapped to hbase. > This does mean it will cost more time on a miss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1775) Add Cube Migrate Support for Global Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363728#comment-15363728 ] Richard Calaba commented on KYLIN-1775: --- What is the attached patch applicable to ??? I tried against kylin-1.5.2.1 (latest release) and this didn't succeed ... or is it agains current master branch ? > Add Cube Migrate Support for Global Dictionary > -- > > Key: KYLIN-1775 > URL: https://issues.apache.org/jira/browse/KYLIN-1775 > Project: Kylin > Issue Type: Improvement > Components: Metadata >Affects Versions: v1.5.3 >Reporter: Yerui Sun >Assignee: Yerui Sun > Fix For: v1.5.3 > > Attachments: KYLIN-1775.patch > > > Since KYLIN-1705, we've introduced global dictionary. The global dictionary > will serialize dict data into hdfs storage directly, instead of save in hbase > resource store. However, when cube was migrated from one metadata to another, > the global dict data didn't copy to the new metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1847) Cleanup of Intermediate tables not working well
[ https://issues.apache.org/jira/browse/KYLIN-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1847: -- Description: I have realized that Hive tables kylin_intermediate__ after cancelling all pending build jobs and dropping the cube are not cleaned properly. It could be that I didn't execute Purge before Dropping the cube ... just a theory, not 100% sure. I also suspect that on hdfs in the /kylin/kylin_metadata/ directory I have too many uncleaned data ... considering that I have just now only 1 cube having a pending build job I see too many subdirectories there ... There might be some relation to he JIRA I already reported as well ... https://issues.apache.org/jira/browse/KYLIN-1828 - but again not 100% sure this is the sole reason. My impression is that the cleanup logic after the Drop cube needs to be re-checked. was: I have realized that Hive tables kylin_intermediate__ after cancelling all pending build jobs and dropping the cube are no cleaned properly. It could be that I didn't execute Purge befor Dropping the cube ... just a theory, not 100% sure. I also suspect that on hdfs in the /kylin/kylin_metadata/ directory I have too many uncleaned data ... considering that I have just now only 1 cube wih pending build job I see too many subdirectories there ... There might be some relation to he JIRA I already reported as well ... https://issues.apache.org/jira/browse/KYLIN-1828 - but again not 100% sure this is the sole reason. My impression is that the cleanup logic after the Drop cube needs to be re-checked. > Cleanup of Intermediate tables not working well > --- > > Key: KYLIN-1847 > URL: https://issues.apache.org/jira/browse/KYLIN-1847 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba > > I have realized that Hive tables > kylin_intermediate__ after cancelling all pending > build jobs and dropping the cube are not cleaned properly. > It could be that I didn't execute Purge before Dropping the cube ... just a > theory, not 100% sure. > I also suspect that on hdfs in the /kylin/kylin_metadata/ directory I have > too many uncleaned data ... considering that I have just now only 1 cube > having a pending build job I see too many subdirectories there ... > There might be some relation to he JIRA I already reported as well ... > https://issues.apache.org/jira/browse/KYLIN-1828 - but again not 100% sure > this is the sole reason. > My impression is that the cleanup logic after the Drop cube needs to be > re-checked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1847) Cleanup of Intermediate tables not working well
Richard Calaba created KYLIN-1847: - Summary: Cleanup of Intermediate tables not working well Key: KYLIN-1847 URL: https://issues.apache.org/jira/browse/KYLIN-1847 Project: Kylin Issue Type: Bug Affects Versions: v1.5.2, v1.5.2.1 Reporter: Richard Calaba I have realized that Hive tables kylin_intermediate__ after cancelling all pending build jobs and dropping the cube are no cleaned properly. It could be that I didn't execute Purge befor Dropping the cube ... just a theory, not 100% sure. I also suspect that on hdfs in the /kylin/kylin_metadata/ directory I have too many uncleaned data ... considering that I have just now only 1 cube wih pending build job I see too many subdirectories there ... There might be some relation to he JIRA I already reported as well ... https://issues.apache.org/jira/browse/KYLIN-1828 - but again not 100% sure this is the sole reason. My impression is that the cleanup logic after the Drop cube needs to be re-checked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1388) Different realization under one model could share some cubing steps
[ https://issues.apache.org/jira/browse/KYLIN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360395#comment-15360395 ] Richard Calaba commented on KYLIN-1388: --- Also dimension statistics (dictionaries) if one lookup table used in 2 different cubes (different models) with some additional properties (dictionary encoding, ...) might be shareable across cubes. > Different realization under one model could share some cubing steps > --- > > Key: KYLIN-1388 > URL: https://issues.apache.org/jira/browse/KYLIN-1388 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > > The data model behind each realizations(cubes) has shared resources, most > significantly being the flattened hive table and the dictionaries. The > realizations can check if other realization (with the same model) has already > created shared resources. If yes, it can directly skip these steps to save > time/resource -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (KYLIN-1837) Feature request - cross cube reuse of Kylin fact/lookup snapshots ...
[ https://issues.apache.org/jira/browse/KYLIN-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba closed KYLIN-1837. - Resolution: Duplicate > Feature request - cross cube reuse of Kylin fact/lookup snapshots ... > - > > Key: KYLIN-1837 > URL: https://issues.apache.org/jira/browse/KYLIN-1837 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: all >Reporter: Richard Calaba >Assignee: Dong Li > > Hello Kylin gurus, > while debugging some issues with high cardinality dimensions - which > obviously requires large data to be processed to emulate the problem thus the > Cube Build process takes significant time ... I came to this idea: > - Cannot be the Snapshot logic - be resued cross cubes ?? > - Let's say I have cube 1 and cube 2 which is clone of cube 1 maybe with > removed some dimnesions or even having same dimensions and just having > different measures definition ... > - Cube 1 build fails somewhere in later steps (snaphost already built) in > step 1 I believe > - Running build of 2nd cube - which let's say is using exactly same > dimensions table and in fact also same fact table - this also requires long > run because in the Step 1 the build process is calculating the snaphots ... > which are already calculated (and still not discared) by the Build Job of > Cube 1 > Is there any chance to define some snapshots reuse scenarios like that (same > model/DB tables referred) ... so the modelling time can be shortened > while playing with the cube design ??? (i.e. testing various optimizations > like joint dimensions, etc ...- those should not be impacted by the source > data stored in the alread calculated snapshots, right ? > Obviously that should be an option while scheduling Cube Build to > enable/disable reuse of snapshots from other similar cubes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements
[ https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360044#comment-15360044 ] Richard Calaba edited comment on KYLIN-1836 at 7/2/16 4:55 PM: --- Ad idea1 - yes - basically similar idea - not specifically looking for target size but more at target compute complexity ... I will be following the Idea 1 at the KYLIN-1743 Ad idea2 - well the aim is this - you design the cube - and you ahve too many dimensiuons to calculate all cuboids. So you check your reporting requirements and try to come up with less dimensions and define Aggregation Group (mandatory / joint dimensions, hierarchies, etc.) to further optimize the cube build time. So you come up with optimized cube, builds fine - most of the queries are running ... but few are not - because either you optimized too much or some of the reporting requirements were not clear or assumptions about data correct ... so the queries which fail with no-realization exception should give you some feedback back saying why no realization was found - from "main fact table not found in any cube" to "combination of dimensions a, b, c not supported by any cube". To overcome the problem with too many queries reported there -> we can have debug on/off switch when you enable this for certian period of time (or short session) to debug the queries which are not finding any realization ... Ad idea3 - the rowkey is getting more clear now - thank you. I understand the reordering to optimize HBase scans. I can guess some optimizations in case of using dictionary / fixed encodings. Little less I understand the "int" encoding - especially because it seems not to work for Bigint. What I am totaly confused about is this: what is gonna happen if I remove one of the dimensions from the rowkey ... will this dimension be still queryable ??? Will the whole thing work correctly ?? UI allows to delete sections of the rowkey ... was (Author: cal...@gmail.com): Ad idea1 - yes - basically similar idea - not specifically looking for target size but more at target compute complexity ... I will be following the Idea 1 at the KYLIN-1743 Ad idea2 - well the aim is this - you design the cube - and you ahve too many dimensiuons to calculate all cuboids. So you check your reporting requirements and try to come up with less dimensions and define Aggregation Group (mandatory / joint dimensions, hierarchies, etc.) to further optimize the cube build time. So you come up with optimized cube, builds fine - most of the queries are running ... but few are not - because either you optimized too much or some of the reporting requirements were not clear or assumptions about data correct ... so the queries which fail with no-realization exception should give you some feedback back saying why no realization was found - from "main fact table not found in any cube" to "combination of dimensions a, b, c not supported by any cube". To overcome the problem with too many queries reported there -> we can have debug on/off switch when you enable this for certian period of time (or short session) to debug the queries which are not finding any realization ... Ad idea3 - the rowkey is getting more clear now - thank you. I understand the reordering to optimize HBase scans. I can guess some optimizations in case of using dictionary / fixed encodings. Little less I understand the "int" encoding - especially because it seems not to work for Bigint. What I am totaly confused with is this: what is gonna happen if I remove one of the dimensions from the rowkey ... will this dimension be still queryable ??? Will the whole thing work correctly ?? UI allows to delete sections of the rowkey ... > Kylin 1.5+ New Aggregation Group - UI improvements > -- > > Key: KYLIN-1836 > URL: https://issues.apache.org/jira/browse/KYLIN-1836 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.0, v1.5.1, v1.5.2, v1.5.3, v1.5.2.1 >Reporter: Richard Calaba > > After reading the Tech Blog - > https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin > Ma - I got few ideas mentioned below - to help the Cube designers understand > impact of their cube design on the Build and Query performance - see below: > BTW: hank you for putting this Blog together !!! and thank you for > referencing this blog through Kylin UI - link in the Aggregation Groups > section !! - it is very powerful optimization technique.) > Idea 1 > = > It would be great if the Advanced Settings section on UI can calculate the > exact number of Cuboids defined by every Aggregation Group (# of combinations > ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and > then also showing the overall total of Cuboids considering
[jira] [Comment Edited] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements
[ https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360044#comment-15360044 ] Richard Calaba edited comment on KYLIN-1836 at 7/2/16 7:31 AM: --- Ad idea1 - yes - basically similar idea - not specifically looking for target size but more at target compute complexity ... I will be following the Idea 1 at the KYLIN-1743 Ad idea2 - well the aim is this - you design the cube - and you ahve too many dimensiuons to calculate all cuboids. So you check your reporting requirements and try to come up with less dimensions and define Aggregation Group (mandatory / joint dimensions, hierarchies, etc.) to further optimize the cube build time. So you come up with optimized cube, builds fine - most of the queries are running ... but few are not - because either you optimized too much or some of the reporting requirements were not clear or assumptions about data correct ... so the queries which fail with no-realization exception should give you some feedback back saying why no realization was found - from "main fact table not found in any cube" to "combination of dimensions a, b, c not supported by any cube". To overcome the problem with too many queries reported there -> we can have debug on/off switch when you enable this for certian period of time (or short session) to debug the queries which are not finding any realization ... Ad idea3 - the rowkey is getting more clear now - thank you. I understand the reordering to optimize HBase scans. I can guess some optimizations in case of using dictionary / fixed encodings. Little less I understand the "int" encoding - especially because it seems not to work for Bigint. What I am totaly confused with is this: what is gonna happen if I remove one of the dimensions from the rowkey ... will this dimension be still queryable ??? Will the whole thing work correctly ?? UI allows to delete sections of the rowkey ... was (Author: cal...@gmail.com): Ad idea1 - yes - basically similar idea - not specifically looking for target size but more at target compute complexity ... I will be following the Idea 1 at the KYLIN-1743 Ad idea2 - well the aim is this - you design the cube - and you ahve too many dimensiuons to calculate all cuboids. So you check your reporting requirements and try to come up with less dimensions and define Aggregation Group (mandatory / joint dimensions, hierarchies, etc.) to further optimize the cube build time. So you come up with optimized cube, builds fine - most of the queries are running ... but few are not - because either you optimized too much or some of the reporting requirements were not clear or assumptions about data correct ... so the queries which fail with no-realization exception should give you some feedback back saying why no realization was found - from "main fact table not found in any cube" to "combination of dimensions a, b, c not supported by any cube" Ad idea3 - the rowkey is getting more clear now - thank you. I understand the reordering to optimize HBase scans. I can guess some optimizations in case of using dictionary / fixed encodings. Little less I understand the "int" encoding - especially because it seems not to work for Bigint. What I am totaly confused with is this: what is gonna happen if I remove one of the dimensions from the rowkey ... will this dimension be still queryable ??? Will the whole thing work correctly ?? UI allows to delete sections of the rowkey ... > Kylin 1.5+ New Aggregation Group - UI improvements > -- > > Key: KYLIN-1836 > URL: https://issues.apache.org/jira/browse/KYLIN-1836 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.0, v1.5.1, v1.5.2, v1.5.3, v1.5.2.1 >Reporter: Richard Calaba > > After reading the Tech Blog - > https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin > Ma - I got few ideas mentioned below - to help the Cube designers understand > impact of their cube design on the Build and Query performance - see below: > BTW: hank you for putting this Blog together !!! and thank you for > referencing this blog through Kylin UI - link in the Aggregation Groups > section !! - it is very powerful optimization technique.) > Idea 1 > = > It would be great if the Advanced Settings section on UI can calculate the > exact number of Cuboids defined by every Aggregation Group (# of combinations > ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and > then also showing the overall total of Cuboids considering ALL the defined > Aggregation Groups. > Idea 2 > = > As Aggregation Group section is about optimizing # of necessary cuboids > assuming you know the queries patterns. This is sometimes easy but for more > complex dashboards
[jira] [Commented] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements
[ https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360044#comment-15360044 ] Richard Calaba commented on KYLIN-1836: --- Ad idea1 - yes - basically similar idea - not specifically looking for target size but more at target compute complexity ... I will be following the Idea 1 at the KYLIN-1743 Ad idea2 - well the aim is this - you design the cube - and you ahve too many dimensiuons to calculate all cuboids. So you check your reporting requirements and try to come up with less dimensions and define Aggregation Group (mandatory / joint dimensions, hierarchies, etc.) to further optimize the cube build time. So you come up with optimized cube, builds fine - most of the queries are running ... but few are not - because either you optimized too much or some of the reporting requirements were not clear or assumptions about data correct ... so the queries which fail with no-realization exception should give you some feedback back saying why no realization was found - from "main fact table not found in any cube" to "combination of dimensions a, b, c not supported by any cube" Ad idea3 - the rowkey is getting more clear now - thank you. I understand the reordering to optimize HBase scans. I can guess some optimizations in case of using dictionary / fixed encodings. Little less I understand the "int" encoding - especially because it seems not to work for Bigint. What I am totaly confused with is this: what is gonna happen if I remove one of the dimensions from the rowkey ... will this dimension be still queryable ??? Will the whole thing work correctly ?? UI allows to delete sections of the rowkey ... > Kylin 1.5+ New Aggregation Group - UI improvements > -- > > Key: KYLIN-1836 > URL: https://issues.apache.org/jira/browse/KYLIN-1836 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.0, v1.5.1, v1.5.2, v1.5.3, v1.5.2.1 >Reporter: Richard Calaba > > After reading the Tech Blog - > https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin > Ma - I got few ideas mentioned below - to help the Cube designers understand > impact of their cube design on the Build and Query performance - see below: > BTW: hank you for putting this Blog together !!! and thank you for > referencing this blog through Kylin UI - link in the Aggregation Groups > section !! - it is very powerful optimization technique.) > Idea 1 > = > It would be great if the Advanced Settings section on UI can calculate the > exact number of Cuboids defined by every Aggregation Group (# of combinations > ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and > then also showing the overall total of Cuboids considering ALL the defined > Aggregation Groups. > Idea 2 > = > As Aggregation Group section is about optimizing # of necessary cuboids > assuming you know the queries patterns. This is sometimes easy but for more > complex dashboards where multiple people work on defining the queries this is > hard to control and guess, thus I would suggest adding a new Tab in the > Monitor Kylin UI - next to Job and Slow Queries add additional tab > "Non-satisfied Queries" showing the Queries which were not able to be > evaluated by Kylin - queries which end with "No Realization" exception. > Together with the Query SQL (including all the parameters) it would help to > show the "missing dimension name" used in the query which was the cause for > not finding proper Cuboid. > Idea 3 > = > Can anyone also document the section Rowkeys in the same section of UI > (Advanced Settings) ??? It is not really clear what effect will have if I > start playing with the Rowkeys section (adding/removing dimension fields; > adding non-dimension fields, ...). All I understand is that the "Rowkeys" > section has impact only on HBase storage of calculated cuboids. Thus doesn't > have impact on Cube Build time that much (except the impact that the Trie for > dictionary needs to be built for every specified rowkey on this tab). I > understand that the major impact of Rowkeys section is thus only on HBase > size / regions split and thus also on the Query execution time. > What I am confused with is whether I can define high-cardinality dimension in > Cube and remove it from the Rowkeys section ??? What would happen in HBase > storage and expected Query time ...would that dimension be still > query-enabled ?? > The closest explanation I found is this Reply from - Yu Feng's here > http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html > == > Reply: Cube size determines how to split region for table in hbase after > generate > all cuboid
[jira] [Comment Edited] (KYLIN-1830) Put KYLIN_JVM_SETTINGS to kylin.properties
[ https://issues.apache.org/jira/browse/KYLIN-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359948#comment-15359948 ] Richard Calaba edited comment on KYLIN-1830 at 7/2/16 3:18 AM: --- Okay - fair enough - but I prefer having configuration parameters for sizing in one location (/conf ) as we there other performance and execution env. related settings. Thus I have found for you a generic solution you can use for ALL variables in setenv.sh - using this solution you can override the parameters from setenv.sh by values specified in the conf/kylin.properties file using same name of the environment variable. See attached & updated setenv.sh and kylin.properties files (both from Kylin 1.5.2.1). Background info about implementation: The trick is that I have defined 2 bash functions which try to read the property value override from conf/kylin.properties - and if found - it uses the value specified there ; if not found (or the value override is commented by #) then it uses the default value as specified in the bin/setenv.sh script. Feel free to include it to the standard packaging - was thinking to provide patch here - but didn't find setenv.sh in source code - probably generated ? Functions Defined - see setenv.sh attachment. Example from setenv.sh - how the override of the variable KYLIN_JVM_SETTINGS: Instead of: export KYLIN_JVM_SETTINGS="" Use: export_property_override KYLIN_JVM_SETTINGS "" I.e: export_property_override KYLIN_JVM_SETTINGS "-Xms1024M -Xmx4096 -XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M" Then in the conf/kylin.properties you can put line: KYLIN_JVM_SETTINGS="-Xms1024M -Xmx16g -XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M" Which will override the default value provided in setenv.sh Tested and working. Enjoy! was (Author: cal...@gmail.com): Okay - fair enough - but I prefer having configuration parameters for sizing in one location (/conf ) as we there other performance and execution env. related settings. Thus I have found for you a generic solution you can use for ALL variables in setenv.sh - using this solution you can override the parameters from setenv.sh by values specified in the conf/kylin.properties file using same name of the environment variable. See attached & updated setenv.sh and kylin.properties files (both from Kylin 1.5.2.1). Background info about implementation: The trick is that I have defined 2 bash functions which try to read the property value override from conf/kylin.properties - and if found - it uses the value specified there ; if not found (or the value override is commented by #) then it uses the default value as specified in the bin/setenv.sh script. Feel free to include it to the standard packaging - was thinking to provide patch here - but didn't find setenv.sh in source code - probably generated ? Functions Defined: function parse_properties() { local param_value=$(awk -F "=" "(!/^(\$|[[:space:]]*#)/) && (/$2/) { idx = index(\$0,\"=\"); print substr(\$0,idx+1)}" "$1" | sed -e 's/^"//' -e 's/"$//') if [[ -z "${param_value}" ]]; then echo $3 else echo `eval echo "${param_value}"` fi } function export_property_override() { # default Kylin property file location - for environment values override local kylin_property_file=${KYLIN_HOME}/conf/kylin.properties export "$1"="$(parse_properties "${kylin_property_file}" "$1" "$2" )" } # Example from setenv.sh - how the override of the variable KYLIN_JVM_SETTINGS: Instead of: export KYLIN_JVM_SETTINGS="" Use: export_property_override KYLIN_JVM_SETTINGS "" I.e: export_property_override KYLIN_JVM_SETTINGS "-Xms1024M -Xmx4096 -XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M" Then in the conf/kylin.properties you can put line: KYLIN_JVM_SETTINGS="-Xms1024M -Xmx16g -XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M" Which will override the default value provided in setenv.sh Tested and working. Enjoy! > Put KYLIN_JVM_SETTINGS to kylin.properties > -- > > Key: KYLIN-1830 > URL: https://issues.apache.org/jira/browse/KYLIN-1830 > Project: Kylin > Issue Type: Improvement >Reporter: Richard
[jira] [Commented] (KYLIN-1830) Put KYLIN_JVM_SETTINGS to kylin.properties
[ https://issues.apache.org/jira/browse/KYLIN-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359948#comment-15359948 ] Richard Calaba commented on KYLIN-1830: --- Okay - fair enough - but I prefer having configuration parameters for sizing in one location (/conf ) as we there other performance and execution env. related settings. Thus I have found for you a generic solution you can use for ALL variables in setenv.sh - using this solution you can override the parameters from setenv.sh by values specified in the conf/kylin.properties file using same name of the environment variable. See attached & updated setenv.sh and kylin.properties files (both from Kylin 1.5.2.1). Background info about implementation: The trick is that I have defined 2 bash functions which try to read the property value override from conf/kylin.properties - and if found - it uses the value specified there ; if not found (or the value override is commented by #) then it uses the default value as specified in the bin/setenv.sh script. Feel free to include it to the standard packaging - was thinking to provide patch here - but didn't find setenv.sh in source code - probably generated ? Functions Defined: function parse_properties() { local param_value=$(awk -F "=" "(!/^(\$|[[:space:]]*#)/) && (/$2/) { idx = index(\$0,\"=\"); print substr(\$0,idx+1)}" "$1" | sed -e 's/^"//' -e 's/"$//') if [[ -z "${param_value}" ]]; then echo $3 else echo `eval echo "${param_value}"` fi } function export_property_override() { # default Kylin property file location - for environment values override local kylin_property_file=${KYLIN_HOME}/conf/kylin.properties export "$1"="$(parse_properties "${kylin_property_file}" "$1" "$2" )" } # Example from setenv.sh - how the override of the variable KYLIN_JVM_SETTINGS: Instead of: export KYLIN_JVM_SETTINGS="" Use: export_property_override KYLIN_JVM_SETTINGS "" I.e: export_property_override KYLIN_JVM_SETTINGS "-Xms1024M -Xmx4096 -XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M" Then in the conf/kylin.properties you can put line: KYLIN_JVM_SETTINGS="-Xms1024M -Xmx16g -XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M" Which will override the default value provided in setenv.sh Tested and working. Enjoy! > Put KYLIN_JVM_SETTINGS to kylin.properties > -- > > Key: KYLIN-1830 > URL: https://issues.apache.org/jira/browse/KYLIN-1830 > Project: Kylin > Issue Type: Improvement >Reporter: Richard Calaba >Priority: Minor > Attachments: kylin.properties, setenv.sh > > > Currently is the KYLIN_JVM_SETTINGS variable stored in the ,/bin/setenv.sh > ... which is not wrong, but as we have also some other memory specific > setting in ./conf/kylin.properties file (like i.e > kylin.job.mapreduce.default.reduce.input.mb or kylin.table.snapshot.max_mb) > it might be good idea to have those performance and sizing related parameters > in one location. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1830) Put KYLIN_JVM_SETTINGS to kylin.properties
[ https://issues.apache.org/jira/browse/KYLIN-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1830: -- Attachment: setenv.sh kylin.properties > Put KYLIN_JVM_SETTINGS to kylin.properties > -- > > Key: KYLIN-1830 > URL: https://issues.apache.org/jira/browse/KYLIN-1830 > Project: Kylin > Issue Type: Improvement >Reporter: Richard Calaba >Priority: Minor > Attachments: kylin.properties, setenv.sh > > > Currently is the KYLIN_JVM_SETTINGS variable stored in the ,/bin/setenv.sh > ... which is not wrong, but as we have also some other memory specific > setting in ./conf/kylin.properties file (like i.e > kylin.job.mapreduce.default.reduce.input.mb or kylin.table.snapshot.max_mb) > it might be good idea to have those performance and sizing related parameters > in one location. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)
[ https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1835: -- Priority: Minor (was: Critical) > Error: java.lang.NumberFormatException: For input count_distinct on Big Int > ??? (#7 Step Name: Build Base Cuboid Data) > -- > > Key: KYLIN-1835 > URL: https://issues.apache.org/jira/browse/KYLIN-1835 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Minor > > I believe I have discovered an error in Kylin realted to count_distinc with > exact precission. > I am not 100% sure - but all points to the fact tha there is a design limit > for count_distinct ... please assess / confirm / reject my observation. > Background info: > = > - large fact table ~ 100 mio rows. > - large customer dimension ~ 10 mio rows > Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type > bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for > one measure max 15 000 000 distinct values ; 2nd measure can have more > distinct values ~ approx. 50 mil (just an estimate). > Error info: > > Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it > errors out without further details in Kylin Log - it shows only "no counters > for job job_1463699962519_16085". > The MR Logs of the job job_1463699962519_16085 sow exceptions: > 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.NumberFormatException: For input string: > "-6628245177096591402" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:495) > at java.lang.Integer.parseInt(Integer.java:527) > at > org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206) > at > org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Just reading the signature of the exception and connecting the Measure > precision return type "bitmap" => looks like that because I have chosen exact > precision (which on UI says supported for int types) is causing this > exception because I am passing Bigint field > If so -> is that a bug (refactory for big int needed) or is it design > limitation ??? Cannot be the count_distinct implemented for bigint (with > exact precision) or do I have to use count_distinct with error rate instead > ??? > In case I do not need to calculate the count_distinct for all dimensions > combinations - I might add some mandatory dimensions to the aggregation > group - but not sure if this would resolve this issue (assuming I keep the > exact precision counts) ... ??? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)
[ https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356743#comment-15356743 ] Richard Calaba commented on KYLIN-1835: --- Confirmed - after changing the count_distinct from 'bitmap' (precisely) to 'hllc16' (Error Rate < 1.22%) - the error is no more appearing. So final question - is the Bigint not supported for count_distinct ??? Lowering priority due to resolution ... > Error: java.lang.NumberFormatException: For input count_distinct on Big Int > ??? (#7 Step Name: Build Base Cuboid Data) > -- > > Key: KYLIN-1835 > URL: https://issues.apache.org/jira/browse/KYLIN-1835 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Critical > > I believe I have discovered an error in Kylin realted to count_distinc with > exact precission. > I am not 100% sure - but all points to the fact tha there is a design limit > for count_distinct ... please assess / confirm / reject my observation. > Background info: > = > - large fact table ~ 100 mio rows. > - large customer dimension ~ 10 mio rows > Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type > bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for > one measure max 15 000 000 distinct values ; 2nd measure can have more > distinct values ~ approx. 50 mil (just an estimate). > Error info: > > Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it > errors out without further details in Kylin Log - it shows only "no counters > for job job_1463699962519_16085". > The MR Logs of the job job_1463699962519_16085 sow exceptions: > 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.NumberFormatException: For input string: > "-6628245177096591402" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:495) > at java.lang.Integer.parseInt(Integer.java:527) > at > org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206) > at > org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Just reading the signature of the exception and connecting the Measure > precision return type "bitmap" => looks like that because I have chosen exact > precision (which on UI says supported for int types) is causing this > exception because I am passing Bigint field > If so -> is that a bug (refactory for big int needed) or is it design > limitation ??? Cannot be the count_distinct implemented for bigint (with > exact precision) or do I have to use count_distinct with error rate instead > ??? > In case I do not need to calculate the count_distinct for all dimensions > combinations - I might add some mandatory dimensions to the aggregation > group - but not sure if this would resolve this issue (assuming I keep the > exact precision counts) ... ??? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356723#comment-15356723 ] Richard Calaba commented on KYLIN-1834: --- I have removed the count_distinct measures to make sure this is not the cause. The error is still there. Further I found out: 1) The missing value is complaining about a field customer_id - which exists in the FACT table (as FK) and in LOOKUP table as PK - LEFT OUTER JOIN 2) However there is 2nd dimension/LOOKUP - also LEFT OUTER JOIN - which is using 2 fields for the join (date and customer_id) - in this 2nd lookup table the value (for customer_id field) which is the exception complaining about - DOESN'T EXIST. However as it is LEFT OUTER JOIN - it doesn't have to ... To me that almotst looks like a bug in Kylin code ... is the dictionary build incorrectly because I have the field used in 2 LOOKUPs ??? > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Critical > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > == > The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 > mio rows. I have increased the JVM -Xmx to 16gb and set the > kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube > build
[jira] [Updated] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1834: -- Priority: Blocker (was: Critical) > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Blocker > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > == > The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 > mio rows. I have increased the JVM -Xmx to 16gb and set the > kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube > build doesn't fail (previously we were getting exception complaining about > the 300MB limit for Dimension dictionary size (req. approx 700MB)). > == > Before that we were getting exception complaining about the Dictionary > encoding problem - "Too high cardinality is not suitable for dictionary -- > cardinality: 10873977" - this we resolved by changing the affected > dimension/row key Encoding from "dict" to "int; length=8" on the Advanced > Settings of the Cube. > == > We have 2 high-cardinality fields (one from fact table and one from the big > dimension (customer - see above). We need to use in distinc_count
[jira] [Updated] (KYLIN-1837) Feature request - cross cube reuse of Kylin fact/lookup snapshots ...
[ https://issues.apache.org/jira/browse/KYLIN-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1837: -- Description: Hello Kylin gurus, while debugging some issues with high cardinality dimensions - which obviously requires large data to be processed to emulate the problem thus the Cube Build process takes significant time ... I came to this idea: - Cannot be the Snapshot logic - be resued cross cubes ?? - Let's say I have cube 1 and cube 2 which is clone of cube 1 maybe with removed some dimnesions or even having same dimensions and just having different measures definition ... - Cube 1 build fails somewhere in later steps (snaphost already built) in step 1 I believe - Running build of 2nd cube - which let's say is using exactly same dimensions table and in fact also same fact table - this also requires long run because in the Step 1 the build process is calculating the snaphots ... which are already calculated (and still not discared) by the Build Job of Cube 1 Is there any chance to define some snapshots reuse scenarios like that (same model/DB tables referred) ... so the modelling time can be shortened while playing with the cube design ??? (i.e. testing various optimizations like joint dimensions, etc ...- those should not be impacted by the source data stored in the alread calculated snapshots, right ? Obviously that should be an option while scheduling Cube Build to enable/disable reuse of snapshots from other similar cubes. was: Hello Kylin gurus, while debugging some issues with high cardinality dimensions - which obviously requires large data to be processed to emulate the problem thus the Cube Build process takes significant time ... I came to this idea: - Cannot be the Snapshot logic - be resued cross cubes ?? - Let's say I have cube 1 and cube 2 which is clone of cube 1 maybe with removed some dimnesions or even having same dimensions and just having different measures definition ... - Cube 1 build fails somewhere in later steps (snaphost already built) in step 1 I believe - Running build of 2nd cube - which let's say is using exactly same dimensions table and in fact also same fact table - this also requires long run because in the Step 1 the build process is calculating the snaphots ... which are already calculated (and still not discared) by the Build Job of Cube 1 Is there any chance to define some snapshots reuse scenarios like that (same model/DB tables referred) ... so the modelling time can be shortened while playing with the cube design ??? (i.e. testing various optimizations like joint dimensions, etc ...- those should not be impacted by the source data stored in the ealready calculated snapshots, right ? > Feature request - cross cube reuse of Kylin fact/lookup snapshots ... > - > > Key: KYLIN-1837 > URL: https://issues.apache.org/jira/browse/KYLIN-1837 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: all >Reporter: Richard Calaba >Assignee: Dong Li > > Hello Kylin gurus, > while debugging some issues with high cardinality dimensions - which > obviously requires large data to be processed to emulate the problem thus the > Cube Build process takes significant time ... I came to this idea: > - Cannot be the Snapshot logic - be resued cross cubes ?? > - Let's say I have cube 1 and cube 2 which is clone of cube 1 maybe with > removed some dimnesions or even having same dimensions and just having > different measures definition ... > - Cube 1 build fails somewhere in later steps (snaphost already built) in > step 1 I believe > - Running build of 2nd cube - which let's say is using exactly same > dimensions table and in fact also same fact table - this also requires long > run because in the Step 1 the build process is calculating the snaphots ... > which are already calculated (and still not discared) by the Build Job of > Cube 1 > Is there any chance to define some snapshots reuse scenarios like that (same > model/DB tables referred) ... so the modelling time can be shortened > while playing with the cube design ??? (i.e. testing various optimizations > like joint dimensions, etc ...- those should not be impacted by the source > data stored in the alread calculated snapshots, right ? > Obviously that should be an option while scheduling Cube Build to > enable/disable reuse of snapshots from other similar cubes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1837) Feature request - cross cube reuse of Kylin fact/lookup snapshots ...
Richard Calaba created KYLIN-1837: - Summary: Feature request - cross cube reuse of Kylin fact/lookup snapshots ... Key: KYLIN-1837 URL: https://issues.apache.org/jira/browse/KYLIN-1837 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: all Reporter: Richard Calaba Assignee: Dong Li Hello Kylin gurus, while debugging some issues with high cardinality dimensions - which obviously requires large data to be processed to emulate the problem thus the Cube Build process takes significant time ... I came to this idea: - Cannot be the Snapshot logic - be resued cross cubes ?? - Let's say I have cube 1 and cube 2 which is clone of cube 1 maybe with removed some dimnesions or even having same dimensions and just having different measures definition ... - Cube 1 build fails somewhere in later steps (snaphost already built) in step 1 I believe - Running build of 2nd cube - which let's say is using exactly same dimensions table and in fact also same fact table - this also requires long run because in the Step 1 the build process is calculating the snaphots ... which are already calculated (and still not discared) by the Build Job of Cube 1 Is there any chance to define some snapshots reuse scenarios like that (same model/DB tables referred) ... so the modelling time can be shortened while playing with the cube design ??? (i.e. testing various optimizations like joint dimensions, etc ...- those should not be impacted by the source data stored in the ealready calculated snapshots, right ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)
[ https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353824#comment-15353824 ] Richard Calaba commented on KYLIN-1835: --- After additional analysis - I believe it is related to high-cardinality field/dimension in the fact table which is required to calculate distinct_counts on ... The bitmap encoding ensuring the exact precision of the distinct_counts (claiming support for integer types) seems to work for integers probably smaller than 4bytes - the Bigint is causing issue. I am testing this theory by changing the count_distinct precision to allow some error rate (<1.22%) ... Can anyone confirm/reject my observations/conclusions meanwhile ??? The cube rebuidl process will take several hours ... > Error: java.lang.NumberFormatException: For input count_distinct on Big Int > ??? (#7 Step Name: Build Base Cuboid Data) > -- > > Key: KYLIN-1835 > URL: https://issues.apache.org/jira/browse/KYLIN-1835 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Critical > > I believe I have discovered an error in Kylin realted to count_distinc with > exact precission. > I am not 100% sure - but all points to the fact tha there is a design limit > for count_distinct ... please assess / confirm / reject my observation. > Background info: > = > - large fact table ~ 100 mio rows. > - large customer dimension ~ 10 mio rows > Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type > bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for > one measure max 15 000 000 distinct values ; 2nd measure can have more > distinct values ~ approx. 50 mil (just an estimate). > Error info: > > Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it > errors out without further details in Kylin Log - it shows only "no counters > for job job_1463699962519_16085". > The MR Logs of the job job_1463699962519_16085 sow exceptions: > 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.NumberFormatException: For input string: > "-6628245177096591402" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:495) > at java.lang.Integer.parseInt(Integer.java:527) > at > org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106) > at > org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159) > at > org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206) > at > org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > Just reading the signature of the exception and connecting the Measure > precision return type "bitmap" => looks like that because I have chosen exact > precision (which on UI says supported for int types) is causing this > exception because I am passing Bigint field > If so -> is that a bug (refactory for big int needed) or is it design > limitation ??? Cannot be the count_distinct implemented for bigint (with > exact precision) or do I have to use count_distinct with error rate instead > ??? > In case I do not need to calculate the count_distinct for all dimensions > combinations - I might add some mandatory dimensions to the aggregation > group - but not sure if this would resolve this issue (assuming I keep the > exact precision counts) ... ??? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353818#comment-15353818 ] Richard Calaba commented on KYLIN-1834: --- Additional info: I further suspect that this exception is rooted in the problem that we are trying to use high cardinality dimension (customer) with approx 10 milion values and customer_id defined as Bigint. Seems that the Kylin engine is trying to build dictionary for this dimension and this is failing we also defined count_distinct measure for this dimension with exact precision which needs to use bitmap encoding (which is supported for int types but has some issue - see my https://issues.apache.org/jira/browse/KYLIN-1835 - seems we might need to switch to caount_distinct calculations with some error rate expected (not using exatc precision which requires bitmap)). So the main question is - in case of 10 mio values (encoded in Bigint) - can we built Dictionary, which method to use ??? Do we have to built dictionary for high cardinality dimensions if the only thing we need is count_distinct measure ?? We do not need group by and where conditions for high cardinality dimension ... > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Critical > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } >
[jira] [Updated] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements
[ https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1836: -- Description: After reading the Tech Blog - https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin Ma - I got few ideas mentioned below - to help the Cube designers understand impact of their cube design on the Build and Query performance - see below: BTW: hank you for putting this Blog together !!! and thank you for referencing this blog through Kylin UI - link in the Aggregation Groups section !! - it is very powerful optimization technique.) Idea 1 = It would be great if the Advanced Settings section on UI can calculate the exact number of Cuboids defined by every Aggregation Group (# of combinations ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and then also showing the overall total of Cuboids considering ALL the defined Aggregation Groups. Idea 2 = As Aggregation Group section is about optimizing # of necessary cuboids assuming you know the queries patterns. This is sometimes easy but for more complex dashboards where multiple people work on defining the queries this is hard to control and guess, thus I would suggest adding a new Tab in the Monitor Kylin UI - next to Job and Slow Queries add additional tab "Non-satisfied Queries" showing the Queries which were not able to be evaluated by Kylin - queries which end with "No Realization" exception. Together with the Query SQL (including all the parameters) it would help to show the "missing dimension name" used in the query which was the cause for not finding proper Cuboid. Idea 3 = Can anyone also document the section Rowkeys in the same section of UI (Advanced Settings) ??? It is not really clear what effect will have if I start playing with the Rowkeys section (adding/removing dimension fields; adding non-dimension fields, ...). All I understand is that the "Rowkeys" section has impact only on HBase storage of calculated cuboids. Thus doesn't have impact on Cube Build time that much (except the impact that the Trie for dictionary needs to be built for every specified rowkey on this tab). I understand that the major impact of Rowkeys section is thus only on HBase size / regions split and thus also on the Query execution time. What I am confused with is whether I can define high-cardinality dimension in Cube and remove it from the Rowkeys section ??? What would happen in HBase storage and expected Query time ...would that dimension be still query-enabled ?? The closest explanation I found is this Reply from - Yu Feng's here http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html == Reply: Cube size determines how to split region for table in hbase after generate all cuboid files, for example, If all of your cuboid file size is 100GB, your cube size set to "SMALL", and the property for SMALL is 10GB, kylin will create hbase table with 10 regions. it will calculate every start rowkey and end rowkey of every region before create htable. then create table with those split infomations. Rowkey column length is another thing, you can choose either use dictionary or set rowkey column length for every dimension , If you use dictionary, kylin will build dictionary for this column(Trie tree), it means every value of the dimension will be encoded as a unique number value, because dimension value is a part of hbase rowkey, so it will reduce hbase table size with dictionary. However, kylin store the dictionary in memory, if dimension cardinality is large, It will become something bad. If you set rowkey column length to N for one dimension, kylin will not build dictionary for it, and every value will be cutted to a N-length string, so, no dictionary in memory, rowkey in hbase table will be longer. == Additional - verly light explanation on the Rowkeys section is here: https://kylin.apache.org/docs15/tutorial/create_cube.html = Rowkeys: the rowkeys are composed by the dimension encoded values. “Dictionary” is the default encoding method; If a dimension is not fit with dictionary (e.g., cardinality > 10 million), select “false” and then enter the fixed length for that dimension, usually that is the max. length of that column; if a value is longer than that size it will be truncated. Please note, without dictionary encoding, the cube size might be much bigger. You can drag & drop a dimension column to adjust its position in rowkey; Put the mandantory dimension at the begining, then followed the dimensions that heavily involved in filters (where condition). Put high cardinality dimensions ahead of low cardinality dimensions. I.e.
[jira] [Updated] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements
[ https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1836: -- Description: After reading the Tech Blog - https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin Ma - I got few ideas mentioned below - to help the Cube designers understand impact of their cube design on the Build and Query performance - see below: BTW: hank you for putting this Blog together !!! and thank you for referencing this blog through Kylin UI - link in the Aggregation Groups section !! - it is very powerful optimization technique.) Idea 1 = It would be great if the Advanced Settings section on UI can calculate the exact number of Cuboids defined by every Aggregation Group (# of combinations ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and then also showing the overall total of Cuboids considering ALL the defined Aggregation Groups. Idea 2 = As Aggregation Group section is about optimizing # of necessary cuboids assuming you know the queries patterns. This is sometimes easy but for more complex dashboards where multiple people work on defining the queries this is hard to control and guess, thus I would suggest adding a new Tab in the Monitor Kylin UI - next to Job and Slow Queries add additional tab "Non-satisfied Queries" showing the Queries which were not able to be evaluated by Kylin - queries which end with "No Realization" exception. Together with the Query SQL (including all the parameters) it would help to show the "missing dimension name" used in the query which was the cause for not finding proper Cuboid. Idea 3 = Can anyone also document the section Rowkeys in the same section of UI (Advanced Settings) ??? It is not really clear what effect will have if I start playing with the Rowkeys section (adding/removing dimension fields; adding non-dimension fields, ...). All I understand is that the "Rowkeys" section has impact only on HBase storage of calculated cuboids. Thus doesn't have impact on Cube Build time that much (except the impact that the Trie for dictionary needs to be built for every specified rowkey on this tab). I understand that the major impact of Rowkeys section is thus only on HBase size / regions split and thus also on the Query execution time. What I am confused with is whether I can define high-cardinality dimension in Cube and remove it from the Rowkeys section ??? What would happen in HBase storage and expected Query time ...would that dimension be still query-enabled ?? The closest explanation I found is this Reply from - Yu Feng's here http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html == Reply: Cube size determines how to split region for table in hbase after generate all cuboid files, for example, If all of your cuboid file size is 100GB, your cube size set to "SMALL", and the property for SMALL is 10GB, kylin will create hbase table with 10 regions. it will calculate every start rowkey and end rowkey of every region before create htable. then create table with those split infomations. Rowkey column length is another thing, you can choose either use dictionary or set rowkey column length for every dimension , If you use dictionary, kylin will build dictionary for this column(Trie tree), it means every value of the dimension will be encoded as a unique number value, because dimension value is a part of hbase rowkey, so it will reduce hbase table size with dictionary. However, kylin store the dictionary in memory, if dimension cardinality is large, It will become something bad. If you set rowkey column length to N for one dimension, kylin will not build dictionary for it, and every value will be cutted to a N-length string, so, no dictionary in memory, rowkey in hbase table will be longer. == Additional - verly light explanation on the Rowkeys section is here: https://kylin.apache.org/docs15/tutorial/create_cube.html = Rowkeys: the rowkeys are composed by the dimension encoded values. “Dictionary” is the default encoding method; If a dimension is not fit with dictionary (e.g., cardinality > 10 million), select “false” and then enter the fixed length for that dimension, usually that is the max. length of that column; if a value is longer than that size it will be truncated. Please note, without dictionary encoding, the cube size might be much bigger. You can drag & drop a dimension column to adjust its position in rowkey; Put the mandantory dimension at the begining, then followed the dimensions that heavily involved in filters (where condition). Put high cardinality dimensions ahead of low cardinality dimensions.
[jira] [Updated] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvement
[ https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1836: -- Description: After reading the Tech Blog - https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin Ma - I got few ideas mentioned below - to help the Cube designers understand impact of their cube design on the Build and Query performance - see below: BTW: hank you for putting this Blog together !!! and thank you for referencing this blog through Kylin UI - link in the Aggregation Groups section !! - it is very powerful optimization technique.) Idea 1 = It would be great if the Advanced Settings section on UI can calculate the exact number of Cuboids defined by every Aggregation Group (# of combinations ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and then also showing the overall total of Cuboids considering ALL the defined Aggregation Groups. Idea 2 = As Aggregation Group section is about optimizing # of necessary cuboids assuming you know the queries patterns. This is sometimes easy but for more complex dashboards where multiple people work on defining the queries this is hard to control and guess, thus I would suggest adding a new Tab in the Monitor Kylin UI - next to Job and Slow Queries add additional tab "Non-satisfied Queries" showing the Queries which were not able to be evaluated by Kylin - queries which end with "No Realization" exception. Together with the Query SQL (including all the parameters) it would help to show the "missing dimension name" used in the query which was the cause for not finding proper Cuboid. Idea 3 = Can anyone also document the section Rowkeys in the same section of UI (Advanced Settings) ??? It is not really clear what effect will have if I start playing with the Rowkeys section (adding/removing dimension fields; adding non-dimension fields, ...). All I understand is that the "Rowkeys" section has impact only on HBase storage of calculated cuboids. Thus doesn't have impact on Cube Build time that much (except the impact that the Trie for dictionary needs to be built for every specified rowkey on this tab). I understand that the major impact of Rowkeys section is thus only on HBase size / regions split and thus also on the Query execution time. What I am confused with is whether I can define high-cardinality dimension in Cube and remove it from the Rowkeys section ??? What would happen in HBase storage and expected Query time ...would that dimension be still query-enabled ?? The closest explanation I found is this Reply from - Yu Feng's here http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html == Reply: Cube size determines how to split region for table in hbase after generate all cuboid files, for example, If all of your cuboid file size is 100GB, your cube size set to "SMALL", and the property for SMALL is 10GB, kylin will create hbase table with 10 regions. it will calculate every start rowkey and end rowkey of every region before create htable. then create table with those split infomations. Rowkey column length is another thing, you can choose either use dictionary or set rowkey column length for every dimension , If you use dictionary, kylin will build dictionary for this column(Trie tree), it means every value of the dimension will be encoded as a unique number value, because dimension value is a part of hbase rowkey, so it will reduce hbase table size with dictionary. However, kylin store the dictionary in memory, if dimension cardinality is large, It will become something bad. If you set rowkey column length to N for one dimension, kylin will not build dictionary for it, and every value will be cutted to a N-length string, so, no dictionary in memory, rowkey in hbase table will be longer. == was: After reading the Tech Blog - https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin Ma - I got few ideas mentioned below - to help the Cube designers understand impact of their cube design on the Build and Query performance - see below: BTW: hank you for putting this Blog together !!! and thank you for referencing this blog through Kylin UI - link in the Aggregation Groups section !! - it is very powerful optimization technique.) Idea 1 = It would be great if the Advanced Settings section on UI can calculate the exact number of Cuboids defined by every Aggregation Group (# of combinations ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and then also showing the overall total of Cuboids considering ALL the defined Aggregation Groups. Idea 2 = As Aggregation Group section is about optimizing # of necessary cuboids
[jira] [Updated] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements
[ https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1836: -- Summary: Kylin 1.5+ New Aggregation Group - UI improvements (was: Kylin 1.5+ New Aggregation Group - UI improvement) > Kylin 1.5+ New Aggregation Group - UI improvements > -- > > Key: KYLIN-1836 > URL: https://issues.apache.org/jira/browse/KYLIN-1836 > Project: Kylin > Issue Type: Improvement >Affects Versions: v1.5.0, v1.5.1, v1.5.2, v1.5.3, v1.5.2.1 >Reporter: Richard Calaba > > After reading the Tech Blog - > https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin > Ma - I got few ideas mentioned below - to help the Cube designers understand > impact of their cube design on the Build and Query performance - see below: > BTW: hank you for putting this Blog together !!! and thank you for > referencing this blog through Kylin UI - link in the Aggregation Groups > section !! - it is very powerful optimization technique.) > Idea 1 > = > It would be great if the Advanced Settings section on UI can calculate the > exact number of Cuboids defined by every Aggregation Group (# of combinations > ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and > then also showing the overall total of Cuboids considering ALL the defined > Aggregation Groups. > Idea 2 > = > As Aggregation Group section is about optimizing # of necessary cuboids > assuming you know the queries patterns. This is sometimes easy but for more > complex dashboards where multiple people work on defining the queries this is > hard to control and guess, thus I would suggest adding a new Tab in the > Monitor Kylin UI - next to Job and Slow Queries add additional tab > "Non-satisfied Queries" showing the Queries which were not able to be > evaluated by Kylin - queries which end with "No Realization" exception. > Together with the Query SQL (including all the parameters) it would help to > show the "missing dimension name" used in the query which was the cause for > not finding proper Cuboid. > Idea 3 > = > Can anyone also document the section Rowkeys in the same section of UI > (Advanced Settings) ??? It is not really clear what effect will have if I > start playing with the Rowkeys section (adding/removing dimension fields; > adding non-dimension fields, ...). All I understand is that the "Rowkeys" > section has impact only on HBase storage of calculated cuboids. Thus doesn't > have impact on Cube Build time that much (except the impact that the Trie for > dictionary needs to be built for every specified rowkey on this tab). I > understand that the major impact of Rowkeys section is thus only on HBase > size / regions split and thus also on the Query execution time. > What I am confused with is whether I can define high-cardinality dimension in > Cube and remove it from the Rowkeys section ??? What would happen in HBase > storage and expected Query time ...would that dimension be still > query-enabled ?? > The closest explanation I found is this Reply from - Yu Feng's here > http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html > == > Reply: Cube size determines how to split region for table in hbase after > generate > all cuboid files, for example, If all of your cuboid file size is 100GB, > your cube size set to "SMALL", and the property for SMALL is 10GB, kylin > will create hbase table with 10 regions. it will calculate every start > rowkey and end rowkey of every region before create htable. then create > table with those split infomations. > Rowkey column length is another thing, you can choose either use dictionary > or set rowkey column length for every dimension , If you use dictionary, > kylin will build dictionary for this column(Trie tree), it means every > value of the dimension will be encoded as a unique number value, because > dimension value is a part of hbase rowkey, so it will reduce hbase table > size with dictionary. However, kylin store the dictionary in memory, if > dimension cardinality is large, It will become something bad. If you set > rowkey > column length to N for one dimension, kylin will not build dictionary for > it, and every value will be cutted to a N-length string, so, no dictionary > in memory, rowkey in hbase table will be longer. > == -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvement
Richard Calaba created KYLIN-1836: - Summary: Kylin 1.5+ New Aggregation Group - UI improvement Key: KYLIN-1836 URL: https://issues.apache.org/jira/browse/KYLIN-1836 Project: Kylin Issue Type: Improvement Affects Versions: v1.5.2, v1.5.1, v1.5.0, v1.5.3, v1.5.2.1 Reporter: Richard Calaba After reading the Tech Blog - https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin Ma - I got few ideas mentioned below - to help the Cube designers understand impact of their cube design on the Build and Query performance - see below: BTW: hank you for putting this Blog together !!! and thank you for referencing this blog through Kylin UI - link in the Aggregation Groups section !! - it is very powerful optimization technique.) Idea 1 = It would be great if the Advanced Settings section on UI can calculate the exact number of Cuboids defined by every Aggregation Group (# of combinations ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and then also showing the overall total of Cuboids considering ALL the defined Aggregation Groups. Idea 2 = As Aggregation Group section is about optimizing # of necessary cuboids assuming you know the queries patterns. This is sometimes easy but for more complex dashboards where multiple people work on defining the queries this is hard to control and guess, thus I would suggest adding a new Tab in the Monitor Kylin UI - next to Job and Slow Queries add additional tab "Non-satisfied Queries" showing the Queries which were not able to be evaluated by Kylin - queries which end with "No Realization" exception. Together with the Query SQL (including all the parameters) it would help to show the "missing dimension name" used in the query which was the cause for not finding proper Cuboid. Idea 3 = Can anyone also document the section Rowkeys in the same section of UI (Advanced Settings) ??? It is not really clear what effect will have if I start playing with the Rowkeys section (adding/removing dimension fields; adding non-dimension fields, ...). All I understand is that the "Rowkeys" section has impact only on HBase storage of calculated cuboids. Thus doesn't have impact on Cbe Build time that much (only that Trie for dictionary needs to be bulit for every specified rowkey) - major impact it hase on HBase size / regions split and thus also Query time. What I am for example confused with is if I can define high-cardinality dimension in Cube and remove it from the Rowkeys section ??? What would happen in HBase storage and expected Query time ... The closest explanation I fond is this from - Yu Feng's reply --http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html ==Reply: Cube size determines how to split region for table in hbase after generate all cuboid files, for example, If all of your cuboid file size is 100GB, your cube size set to "SMALL", and the property for SMALL is 10GB, kylin will create hbase table with 10 regions. it will calculate every start rowkey and end rowkey of every region before create htable. then create table with those split infomations. Rowkey column length is another thing, you can choose either use dictionary or set rowkey column length for every dimension , If you use dictionary, kylin will build dictionary for this column(Trie tree), it means every value of the dimension will be encoded as a unique number value, because dimension value is a part of hbase rowkey, so it will reduce hbase table size with dictionary. However, kylin store the dictionary in memory, if dimension cardinality is large, It will become something bad. If you set rowkey column length to N for one dimension, kylin will not build dictionary for it, and every value will be cutted to a N-length string, so, no dictionary in memory, rowkey in hbase table will be longer. == -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)
Richard Calaba created KYLIN-1835: - Summary: Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data) Key: KYLIN-1835 URL: https://issues.apache.org/jira/browse/KYLIN-1835 Project: Kylin Issue Type: Bug Affects Versions: v1.5.2, v1.5.2.1 Reporter: Richard Calaba Priority: Critical I believe I have discovered an error in Kylin realted to count_distinc with exact precission. I am not 100% sure - but all points to the fact tha there is a design limit dor count_distinct ... please assess / confirm / reject my observation. Background info: = - large fact table ~ 100 mio rows. - large customer dimension ~ 10 mio rows Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type bitma) on 2 high-cardinality fields of type Bigint Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it errors out without further details in Kylin Log - it shows only "no counters for job job_1463699962519_16085". The MR Logs of the job job_1463699962519_16085 sow exceptions: 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.NumberFormatException: For input string: "-6628245177096591402" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:495) at java.lang.Integer.parseInt(Integer.java:527) at org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63) at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106) at org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98) at org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189) at org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159) at org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206) at org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Just reading the signature of the exception and connecting the Measure precision return type "bitmap" => looks like that because I have chosen exact precision (which on UI says supported for int types) is causing this exception because I am passing Bigint field If so -> is that a bug or design limitation ??? Cannot be the count_distinct implemented for bigint (with exact precision) or do I have to use count_distinct with error rate instead ??? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1834: -- Attachment: job_2016_06_28_09_59_12-value-not-found.zip > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Critical > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > == > The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 > mio rows. I have increased the JVM -Xmx to 16gb and set the > kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube > build doesn't fail (previously we were getting exception complaining about > the 300MB limit for Dimension dictionary size (req. approx 700MB)). > == > Before that we were getting exception complaining about the Dictionary > encoding problem - "Too high cardinality is not suitable for dictionary -- > cardinality: 10873977" - this we resolved by changing the affected > dimension/row key Encoding from "dict" to "int; length=8" on the Advanced > Settings of the Cube. > == > We have 2 high-cardinality fields (one from fact table and one from the big > dimension (customer - see above). We need to
[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353486#comment-15353486 ] Richard Calaba commented on KYLIN-1834: --- The exception is pre-pend with additional error explaining which value is the problem: 2016-06-28 08:07:35,215 ERROR [pool-2-thread-1] dict.TrieDictionary:173 : Not a valid value: -4270603867011174754 2016-06-28 08:07:35,220 ERROR [pool-2-thread-1] execution.AbstractExecutable:62 : error execute HadoopShellExecutable{id=9549b14f-d25c-408b-8027-841f4fb94298-03, name=Build Dimension Dictionary, state=RUNNING} java.lang.IllegalArgumentException: Value not exists! --- checking now which field it relates to ... --- also adding Diagnostic logs collectedh > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > -- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Richard Calaba >Priority: Critical > Attachments: job_2016_06_28_09_59_12-value-not-found.zip > > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * > * - if roundingFlag=0, throw IllegalArgumentException; > * - if roundingFlag<0, the closest smaller ID integer if exist; > * - if roundingFlag>0, the closest bigger ID integer if exist. > * > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > == > The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 > mio rows. I have increased the JVM -Xmx to 16gb and set the > kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube > build doesn't fail (previously we were getting exception complaining about > the 300MB limit for Dimension dictionary size (req. approx 700MB)). >
[jira] [Updated] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1834: -- Description: Getting exception in Step 4 - Build Dimension Dictionary: java.lang.IllegalArgumentException: Value not exists! at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) at org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) at org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) result code:2 The code which generates the exception is: org.apache.kylin.dimension.Dictionary.java: /** * A lower level API, return ID integer from raw value bytes. In case of not found * * - if roundingFlag=0, throw IllegalArgumentException; * - if roundingFlag<0, the closest smaller ID integer if exist; * - if roundingFlag>0, the closest bigger ID integer if exist. * * Bypassing the cache layer, this could be significantly slower than getIdFromValue(T value). * * @throws IllegalArgumentException * if value is not found in dictionary and rounding is off; * or if rounding cannot find a smaller or bigger ID */ final public int getIdFromValueBytes(byte[] value, int offset, int len, int roundingFlag) throws IllegalArgumentException { if (isNullByteForm(value, offset, len)) return nullId(); else { int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag); if (id < 0) throw new IllegalArgumentException("Value not exists!"); return id; } } == The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 mio rows. I have increased the JVM -Xmx to 16gb and set the kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube build doesn't fail (previously we were getting exception complaining about the 300MB limit for Dimension dictionary size (req. approx 700MB)). == Before that we were getting exception complaining about the Dictionary encoding problem - "Too high cardinality is not suitable for dictionary -- cardinality: 10873977" - this we resolved by changing the affected dimension/row key Encoding from "dict" to "int; length=8" on the Advanced Settings of the Cube. == We have 2 high-cardinality fields (one from fact table and one from the big dimension (customer - see above). We need to use in distinc_count measure for our calculations. I wonder if this exception Value not found! is somewhat related ??? Those count_distinct measures are defined one with return type "bitmap" (exact precission - only for Int columns) and 2nd with return type "hllc16" (error rate <= 1.22 %) == I am looking for any clues to debug the cause of this error and way how to circumwent this ... was: Getting exception in Step 4 - Build Dimension Dictionary: java.lang.IllegalArgumentException: Value not exists! at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) at
[jira] [Updated] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1834: -- Description: Getting exception in Step 4 - Build Dimension Dictionary: java.lang.IllegalArgumentException: Value not exists! at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) at org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) at org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) result code:2 The code which generates the exception is: org.apache.kylin.dimension.Dictionary.java: /** * A lower level API, return ID integer from raw value bytes. In case of not found * * - if roundingFlag=0, throw IllegalArgumentException; * - if roundingFlag<0, the closest smaller ID integer if exist; * - if roundingFlag>0, the closest bigger ID integer if exist. * * Bypassing the cache layer, this could be significantly slower than getIdFromValue(T value). * * @throws IllegalArgumentException * if value is not found in dictionary and rounding is off; * or if rounding cannot find a smaller or bigger ID */ final public int getIdFromValueBytes(byte[] value, int offset, int len, int roundingFlag) throws IllegalArgumentException { if (isNullByteForm(value, offset, len)) return nullId(); else { int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag); if (id < 0) throw new IllegalArgumentException("Value not exists!"); return id; } } == The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 mio rows. I have increased the JVM -Xmx to 16gb and set the kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube build doesn't fail (previously we were getting exception complaining about the 300MB limit for Dimension dictionary size (req. approx 700MB)). == Before that we were getting exception complaining about the Dictionary encoding problem - "Too high cardinality is not suitable for dictionary -- cardinality: 10873977" - this we resolved by changing the affected dimension/row key Encoding from "dict" to "int; length=8" on the Advanced Settings of the Cube. == We have 2 high-cardinality fields (one from fact table and one from the big dimension (customer - see above). We need to use in distinc_count measure for our calculations. I wonder if this exception Value not found! is somewhat related ??? Those count_distinct measures are defined one with return type "bitmap" (exact precission - only for Int columns) and 2nd with return type "hllc16" (error rate <= 1.22 %) == I am looking for any clues to debug the cause of this error and way how to circumwent this ... was: Getting exception in Step 4 - Build Dimension Dictionary: java.lang.IllegalArgumentException: Value not exists! at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) at
[jira] [Updated] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1834: -- Description: Getting exception in Step 4 - Build Dimension Dictionary: java.lang.IllegalArgumentException: Value not exists! at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) at org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) at org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) result code:2 The code which generates the exception is: org.apache.kylin.dimension.Dictionary.java: /** * A lower level API, return ID integer from raw value bytes. In case of not found * * - if roundingFlag=0, throw IllegalArgumentException; * - if roundingFlag<0, the closest smaller ID integer if exist; * - if roundingFlag>0, the closest bigger ID integer if exist. * * Bypassing the cache layer, this could be significantly slower than getIdFromValue(T value). * * @throws IllegalArgumentException * if value is not found in dictionary and rounding is off; * or if rounding cannot find a smaller or bigger ID */ final public int getIdFromValueBytes(byte[] value, int offset, int len, int roundingFlag) throws IllegalArgumentException { if (isNullByteForm(value, offset, len)) return nullId(); else { int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag); if (id < 0) throw new IllegalArgumentException("Value not exists!"); return id; } } == The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 mio rows. I have increased the JVM -Xmx to 16gb and set the kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube build doesn't fail (previously we were getting exception complaining about the 300MB limit for Dimension dictionary size (req. approx 700MB)). == Before that we were getting exception complaining about the Dictionary encoding problem - "Too high cardinality is not suitable for dictionary -- cardinality: 10873977" - this we resolved by changing the affected Encoding from "dict" to "int; length=8" == Those 2 high cardinality fields (one from fact table and one from the big dimension (see above) we need to use in distinc_count measure for our calculations. I wonder if this is somewhat related ??? == I am looking for any clues to debug the cause of this error and way how to circumwent this ... was: Getting exception in Step 4 - Build Dimension Dictionary: java.lang.IllegalArgumentException: Value not exists! at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) at org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) at
[jira] [Created] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary
Richard Calaba created KYLIN-1834: - Summary: java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary Key: KYLIN-1834 URL: https://issues.apache.org/jira/browse/KYLIN-1834 Project: Kylin Issue Type: Bug Affects Versions: v1.5.2, v1.5.2.1 Reporter: Richard Calaba Priority: Critical Getting exception in Step 4 - Build Dimension Dictionary: java.lang.IllegalArgumentException: Value not exists! at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) at org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) at org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) result code:2 The code which generates the exception is: org.apache.kylin.dimension.Dictionary.java: /** * A lower level API, return ID integer from raw value bytes. In case of not found * * - if roundingFlag=0, throw IllegalArgumentException; * - if roundingFlag<0, the closest smaller ID integer if exist; * - if roundingFlag>0, the closest bigger ID integer if exist. * * Bypassing the cache layer, this could be significantly slower than getIdFromValue(T value). * * @throws IllegalArgumentException * if value is not found in dictionary and rounding is off; * or if rounding cannot find a smaller or bigger ID */ final public int getIdFromValueBytes(byte[] value, int offset, int len, int roundingFlag) throws IllegalArgumentException { if (isNullByteForm(value, offset, len)) return nullId(); else { int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag); if (id < 0) throw new IllegalArgumentException("Value not exists!"); return id; } } == The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 mio entries. I have increases the JVM -Xmx to 16gb and set the kylin.table.snapshot.max_mb=20148 in kylin.properties to make sure the Cube build doesn't fail (previously we were getting exception complaining about the 300MB limit for Dimension dictionary size (req. approx 700MB)). == Before that we were getting exception complaining about the Dictionary encoding problem - "Too high cardinality is not suitable for dictionary -- cardinality: 10873977" - this we resolved by changing the affected Encoding from "dict" to "int; length=8" == Those 2 high cardinality fields (one from fact table and one from the big dimension (see above) we need to use in distinc_count measure for our calculations. I wonder if this is somewhat related ??? == I am looking for any clues to debug the cause of this error and way how to circumwent this ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1829) Add execution of "utility" classes to the System tab of Kylin UI
Richard Calaba created KYLIN-1829: - Summary: Add execution of "utility" classes to the System tab of Kylin UI Key: KYLIN-1829 URL: https://issues.apache.org/jira/browse/KYLIN-1829 Project: Kylin Issue Type: Improvement Reporter: Richard Calaba There is a bunch of "hidden" and/or semi-documented classes in Kylin engine which can be very useful for standard maintenance of healthy Kylin instance - it would be very good if those are connected to the System tab of the Kylin UI so administrators can run them directly from Kylin UI and collect their execution log also there, few candidates: 1) ./bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob 2) ./bin/kylin.sh org.apache.kylin.cube.cli.CubeSignatureRefresher 3) ./bin/metastore.sh for backup/restore actions with support to list the content of ./meta_backups directory In addition I found many Kylin engine classes with main method (thus executable tools for Kylin) which might be also good candidates to be integrated here - just do not know their functions/parameters. To know I would have to read the code of all of them. Maybe the authors of the tools can create official list with documentation of supported parameters - and this list/docu can be also part of the System tab on UI. Few other classes I found - executable through kylin.sh - some of those probably already connected to UI in 1.5.2.1 (like the diagnostics CLI ..) But some of those I still do not know what they are for .. though the classes having CLI in the name give impression that those should be "stable" internal interfaces for Kylin engine. ./bin/kylin.sh org.apache.kylin.tool.CubeMetaExtractor ./bin/kylin.sh org.apache.kylin.tool.DiagnosisInfoCLI ./bin/kylin.sh org.apache.kylin.tool.HBaseUsageExtractor ./bin/kylin.sh org.apache.kylin.tool.JobDiagnosisInfoCLI ./bin/kylin.sh org.apache.kylin.job.hadoop.invertedindex.IICLI - error ./bin/kylin.sh org.apache.kylin.common.KylinVersion - this would be good to have on UI for sure -> sometimes I do need to know exact Kylin version ./bin/kylin.sh org.apache.kylin.common.persistence.ResourceTool - this is basically metastore.sh ./bin/kylin.sh org.apache.kylin.cube.cli.DumpDictionaryCLI ./kylin.sh org.apache.kylin.cube.cuboid.CuboidCLI ./kylin.sh org.apache.kylin.query.QueryCli ... and some others -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1828) java.lang.StringIndexOutOfBoundsException in org.apache.kylin.storage.hbase.util.StorageCleanupJob
[ https://issues.apache.org/jira/browse/KYLIN-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352252#comment-15352252 ] Richard Calaba edited comment on KYLIN-1828 at 6/28/16 3:10 AM: Additional workaround for those who need to cleanup the hive tables manually: To remove ALL kylin_intermediate hive tables from default schema I run: hive -e 'use default;show tables "kylin_intermediate_*";' | xargs -I '{}' hive -e 'use default;drop table {}' Then running the: ./bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob 2 times seems did the additional cleanups as well ... was (Author: cal...@gmail.com): Additional workaround for those who need to cleanup the hive tables manually: To remove ALL kylin_intermediate hive tables from default schema I run: hive -e 'use default;show tables "kylin_intermediate_*";' | xargs -I '{}' hive -e 'use default;drop table {}' > java.lang.StringIndexOutOfBoundsException in > org.apache.kylin.storage.hbase.util.StorageCleanupJob > -- > > Key: KYLIN-1828 > URL: https://issues.apache.org/jira/browse/KYLIN-1828 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v1.5.2.1 >Reporter: Richard Calaba > > While running storage cleanup job: > ./bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete > true > I see Hive tables in form > kylin_intermediate__1970010100_20160701031500 > in the defaul schema. > While running the above storage cleaner (v.1.5.2.1 - all previously built > Cubes Disabled & Dropped) I am getting an error: > 2016-06-27 22:28:08,480 INFO [main StorageCleanupJob:262]: Remove > intermediate hive table with job id fc44da88-cffc-4710-8726-ff910cf83451 with > job status ERROR > usage: StorageCleanupJob > -deleteDelete the unused storage > Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String > index out of range: -2 > at java.lang.String.substring(String.java:1904) > at > org.apache.kylin.storage.hbase.util.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:269) > at > org.apache.kylin.storage.hbase.util.StorageCleanupJob.run(StorageCleanupJob.java:91) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.storage.hbase.util.StorageCleanupJob.main(StorageCleanupJob.java:308) > 2016-06-27 22:28:08,486 INFO [Thread-0 > HConnectionManager$HConnectionImplementation:1907]: Closing zookeeper > sessionid=0x154c97461586119 > 2016-06-27 22:28:08,491 INFO [Thread-0 ZooKeeper:684]: Session: > 0x154c97461586119 closed > 2016-06-27 22:28:08,491 INFO [main-EventThread ClientCnxn:509]: EventThread > shut down -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1828) java.lang.StringIndexOutOfBoundsException in org.apache.kylin.storage.hbase.util.StorageCleanupJob
[ https://issues.apache.org/jira/browse/KYLIN-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1828: -- Component/s: Job Engine > java.lang.StringIndexOutOfBoundsException in > org.apache.kylin.storage.hbase.util.StorageCleanupJob > -- > > Key: KYLIN-1828 > URL: https://issues.apache.org/jira/browse/KYLIN-1828 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v1.5.2.1 >Reporter: Richard Calaba > > While running storage cleanup job: > ./bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete > true > I see Hive tables in form > kylin_intermediate__1970010100_20160701031500 > in the defaul schema. > While running the above storage cleaner (v.1.5.2.1 - all previously built > Cubes Disabled & Dropped) I am getting an error: > 2016-06-27 22:28:08,480 INFO [main StorageCleanupJob:262]: Remove > intermediate hive table with job id fc44da88-cffc-4710-8726-ff910cf83451 with > job status ERROR > usage: StorageCleanupJob > -deleteDelete the unused storage > Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String > index out of range: -2 > at java.lang.String.substring(String.java:1904) > at > org.apache.kylin.storage.hbase.util.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:269) > at > org.apache.kylin.storage.hbase.util.StorageCleanupJob.run(StorageCleanupJob.java:91) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.storage.hbase.util.StorageCleanupJob.main(StorageCleanupJob.java:308) > 2016-06-27 22:28:08,486 INFO [Thread-0 > HConnectionManager$HConnectionImplementation:1907]: Closing zookeeper > sessionid=0x154c97461586119 > 2016-06-27 22:28:08,491 INFO [Thread-0 ZooKeeper:684]: Session: > 0x154c97461586119 closed > 2016-06-27 22:28:08,491 INFO [main-EventThread ClientCnxn:509]: EventThread > shut down -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1828) java.lang.StringIndexOutOfBoundsException in org.apache.kylin.storage.hbase.util.StorageCleanupJob
[ https://issues.apache.org/jira/browse/KYLIN-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352250#comment-15352250 ] Richard Calaba commented on KYLIN-1828: --- Further Analysis of the problem: StorageCleanupJob.java - line 266 - 270: 266while ((line = reader.readLine()) != null) { 267if (line.startsWith("kylin_intermediate_")) { 268boolean isNeedDel = false; 269String uuid = line.substring(line.length() - uuidLength, line.length()); 270uuid = uuid.replace("_", "-"); Obviously the " String uuid = line.substring(line.length() - uuidLength, line.length());" on line 269 is causing the String out of bounds exception -> not sure why - do not have enough info about the assumed pattern for the table names -> but in my hive DB (default schema) I see kylin table names containing cube (or model) name and not uuid (kylin_intermediate_ - maybe this is the cause of the problem - not sure. Exception rasied there at line 269 is further causing bail out from the method cleanUnusedIntermediateHiveTable through the call stack - causing the additional confusing message in the log: usage: StorageCleanupJob -delete Delete the unused storage which indicates that the class was not called with correct parameters - BUT it was - according to https://kylin.apache.org/docs/howto/howto_cleanup_storage.html. > java.lang.StringIndexOutOfBoundsException in > org.apache.kylin.storage.hbase.util.StorageCleanupJob > -- > > Key: KYLIN-1828 > URL: https://issues.apache.org/jira/browse/KYLIN-1828 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2.1 >Reporter: Richard Calaba > > While running storage cleanup job: > ./bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete > true > I see Hive tables in form > kylin_intermediate__1970010100_20160701031500 > in the defaul schema. > While running the above storage cleaner (v.1.5.2.1 - all previously built > Cubes Disabled & Dropped) I am getting an error: > 2016-06-27 22:28:08,480 INFO [main StorageCleanupJob:262]: Remove > intermediate hive table with job id fc44da88-cffc-4710-8726-ff910cf83451 with > job status ERROR > usage: StorageCleanupJob > -deleteDelete the unused storage > Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String > index out of range: -2 > at java.lang.String.substring(String.java:1904) > at > org.apache.kylin.storage.hbase.util.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:269) > at > org.apache.kylin.storage.hbase.util.StorageCleanupJob.run(StorageCleanupJob.java:91) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.storage.hbase.util.StorageCleanupJob.main(StorageCleanupJob.java:308) > 2016-06-27 22:28:08,486 INFO [Thread-0 > HConnectionManager$HConnectionImplementation:1907]: Closing zookeeper > sessionid=0x154c97461586119 > 2016-06-27 22:28:08,491 INFO [Thread-0 ZooKeeper:684]: Session: > 0x154c97461586119 closed > 2016-06-27 22:28:08,491 INFO [main-EventThread ClientCnxn:509]: EventThread > shut down -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1828) java.lang.StringIndexOutOfBoundsException in org.apache.kylin.storage.hbase.util.StorageCleanupJob
Richard Calaba created KYLIN-1828: - Summary: java.lang.StringIndexOutOfBoundsException in org.apache.kylin.storage.hbase.util.StorageCleanupJob Key: KYLIN-1828 URL: https://issues.apache.org/jira/browse/KYLIN-1828 Project: Kylin Issue Type: Bug Affects Versions: v1.5.2.1 Reporter: Richard Calaba While running storage cleanup job: ./bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete true I see Hive tables in form kylin_intermediate__1970010100_20160701031500 in the defaul schema. While running the above storage cleaner (v.1.5.2.1 - all previously built Cubes Disabled & Dropped) I am getting an error: 2016-06-27 22:28:08,480 INFO [main StorageCleanupJob:262]: Remove intermediate hive table with job id fc44da88-cffc-4710-8726-ff910cf83451 with job status ERROR usage: StorageCleanupJob -deleteDelete the unused storage Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -2 at java.lang.String.substring(String.java:1904) at org.apache.kylin.storage.hbase.util.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:269) at org.apache.kylin.storage.hbase.util.StorageCleanupJob.run(StorageCleanupJob.java:91) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.storage.hbase.util.StorageCleanupJob.main(StorageCleanupJob.java:308) 2016-06-27 22:28:08,486 INFO [Thread-0 HConnectionManager$HConnectionImplementation:1907]: Closing zookeeper sessionid=0x154c97461586119 2016-06-27 22:28:08,491 INFO [Thread-0 ZooKeeper:684]: Session: 0x154c97461586119 closed 2016-06-27 22:28:08,491 INFO [main-EventThread ClientCnxn:509]: EventThread shut down -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
[ https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba closed KYLIN-1810. - Resolution: Fixed > NPE in > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164) > > > Key: KYLIN-1810 > URL: https://issues.apache.org/jira/browse/KYLIN-1810 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2.1 >Reporter: Richard Calaba > Attachments: job_2016_06_21_16_23_51-err.zip > > > Hello, > running into weird issue. I have designed Kylin cube. Clonned it to another > cube without any changes and run the Build job. The Build succeeded. Then I > have discarder the build job and disabled and dropped the cube. Clonned the > same cube again (into different name than previously) and then again started > to build the cube. Getting an NPE below every time in Step 4 - Build > Dimension Dictionary": > java.lang.NullPointerException > at > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164) > at > org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167) > at > org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > Attaching the Diagnostic logs. > Any clue how to resolve this ??? > I am thinking to wipe all Kylin metadata from repository and try to restore > from backup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
[ https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343640#comment-15343640 ] Richard Calaba edited comment on KYLIN-1810 at 6/22/16 3:20 AM: Ok, did: 1) git clone -b kylin-1.5.2.1 2) cd kylin 3) wget https://issues.apache.org/jira/secure/attachment/12804584/initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch 3) git apply initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch 4) rebuild kylin distro and started patched kylin 5) Resumed the failed Cube jobs -> both finished OK Confirming the resolution & closing the ticket. Thank you ! was (Author: cal...@gmail.com): Ok, did: 1) git clone -b kylin-1.5.2.1 2) cd kylin 3) wget https://issues.apache.org/jira/secure/attachment/12804584/initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch 3) git apply initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch 4) rebuild kylin distro and started patched kylin 5) Resumed the failed Cube jobs -> bith finished OK Confirming the resolution & closing the ticket. Thank you ! > NPE in > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164) > > > Key: KYLIN-1810 > URL: https://issues.apache.org/jira/browse/KYLIN-1810 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2.1 >Reporter: Richard Calaba > Attachments: job_2016_06_21_16_23_51-err.zip > > > Hello, > running into weird issue. I have designed Kylin cube. Clonned it to another > cube without any changes and run the Build job. The Build succeeded. Then I > have discarder the build job and disabled and dropped the cube. Clonned the > same cube again (into different name than previously) and then again started > to build the cube. Getting an NPE below every time in Step 4 - Build > Dimension Dictionary": > java.lang.NullPointerException > at > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164) > at > org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167) > at > org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > Attaching the Diagnostic logs. > Any clue how to resolve this ??? > I am thinking to wipe all Kylin metadata from repository and try to restore > from backup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
[ https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343208#comment-15343208 ] Richard Calaba edited comment on KYLIN-1810 at 6/22/16 2:25 AM: And the bad news is that I have created new cube based on new model - not sharing the fact table Hive View but sharing the Dimensions (Lookup Tables) Hive views -> the new cube also fails with same exception. Seems 'reuse' of the table snapshot across cube builds and across different cubes doesn't work ... was (Author: cal...@gmail.com): ANd the bad news is that I have created new cube based on new model - not hsaring the fact table Hive View but sharing the Dimensions (Lookup Tables) Hive views -> the new cube also fails with same exception. Seems 'reuse' of the table snapshot across cube builds and across different cubes doesn't work ... > NPE in > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164) > > > Key: KYLIN-1810 > URL: https://issues.apache.org/jira/browse/KYLIN-1810 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2.1 >Reporter: Richard Calaba > Attachments: job_2016_06_21_16_23_51-err.zip > > > Hello, > running into weird issue. I have designed Kylin cube. Clonned it to another > cube without any changes and run the Build job. The Build succeeded. Then I > have discarder the build job and disabled and dropped the cube. Clonned the > same cube again (into different name than previously) and then again started > to build the cube. Getting an NPE below every time in Step 4 - Build > Dimension Dictionary": > java.lang.NullPointerException > at > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164) > at > org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167) > at > org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > Attaching the Diagnostic logs. > Any clue how to resolve this ??? > I am thinking to wipe all Kylin metadata from repository and try to restore > from backup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
[ https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343208#comment-15343208 ] Richard Calaba commented on KYLIN-1810: --- ANd the bad news is that I have created new cube based on new model - not hsaring the fact table Hive View but sharing the Dimensions (Lookup Tables) Hive views -> the new cube also fails with same exception. Seems 'reuse' of the table snapshot across cube builds and across different cubes doesn't work ... > NPE in > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164) > > > Key: KYLIN-1810 > URL: https://issues.apache.org/jira/browse/KYLIN-1810 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2.1 >Reporter: Richard Calaba > Attachments: job_2016_06_21_16_23_51-err.zip > > > Hello, > running into weird issue. I have designed Kylin cube. Clonned it to another > cube without any changes and run the Build job. The Build succeeded. Then I > have discarder the build job and disabled and dropped the cube. Clonned the > same cube again (into different name than previously) and then again started > to build the cube. Getting an NPE below every time in Step 4 - Build > Dimension Dictionary": > java.lang.NullPointerException > at > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164) > at > org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167) > at > org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > Attaching the Diagnostic logs. > Any clue how to resolve this ??? > I am thinking to wipe all Kylin metadata from repository and try to restore > from backup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
[ https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1810: -- Description: Hello, running into weird issue. I have designed Kylin cube. Clonned it to another cube without any changes and run the Build job. The Build succeeded. Then I have discarder the build job and disabled and dropped the cube. Clonned the same cube again (into different name than previously) and then again started to build the cube. Getting an NPE below every time in Step 4 - Build Dimension Dictionary": java.lang.NullPointerException at org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164) at org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167) at org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128) at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108) at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) result code:2 Attaching the Diagnostic logs. Any clue how to resolve this ??? I am thinking to wipe all Kylin metadata from repository and try to restore from backup. was: Hello, running into weird issue. I have designed Kylin cube. Clonned it to another cube without any changes and run the Build job. The Build succeeded. Then I have discarder the build job and disabled and dropped the cube. Clonned the same cube again (into different name than previously) and then again started to build the cube. Getting an NPE below every time in Step 4 - Build Dimension Dictionary": Kylin jambajuice_3_0 Insight Model Monitor System Help Welcome, ADMIN Jobs Slow Queries Cube Name: Filter ... Jobs in:NEW PENDING RUNNING FINISHED ERROR DISCARDED Job NameCubeProgressLast Modified Time Duration Actions JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_20160626042000 - BUILD - PDT 2016-06-21 16:29:03JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone ERROR 2016-06-21 15:35:21 PST 5.80 mins Action JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_20160626042000 - BUILD - PDT 2016-06-21 16:06:48JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone 100% 2016-06-21 15:14:24 PST 6.85 mins Action JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_2016062600 - BUILD - PDT 2016-06-21 14:10:10JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone 14.29% 2016-06-21 14:59:37 PST 5.70 mins Action Total: 3 Detail Information Job NameJAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_20160626042000 - BUILD - PDT 2016-06-21 16:29:03 Job ID ac090c87-496d-4173-9503-6a9ec97a764e Status ERROR Duration5.80 mins MapReduce Waiting 0.18 mins Start 2016-06-21 15:29:32 PST 2016-06-21 15:29:32 PST #1 Step Name: Create Intermediate Flat Hive Table Duration: 2.82 mins 2016-06-21 15:32:21 PST #2 Step Name: Materialize Hive View in Lookup Tables Duration: 2.11 mins 2016-06-21 15:34:28 PST #3 Step Name: Extract Fact Table Distinct Columns Duration: 0.86 mins 2016-06-21 15:35:20 PST #4 Step Name: Build Dimension Dictionary Duration: 0.02 mins #5 Step Name: Save Cuboid Statistics Duration: 0 seconds #6 Step Name: Create HTable Duration: 0 seconds #7 Step Name: Build Base Cuboid Data Duration: 0 seconds #8 Step Name: Build N-Dimension Cuboid Data : 8-Dimension Duration: 0 seconds #9 Step Name: Build N-Dimension Cuboid Data : 7-Dimension Duration: 0 seconds #10 Step Name: Build N-Dimension Cuboid Data : 6-Dimension Duration: 0 seconds #11 Step Name: Build N-Dimension Cuboid Data : 5-Dimension Duration: 0 seconds #12
[jira] [Updated] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
[ https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1810: -- Attachment: job_2016_06_21_16_23_51-err.zip > NPE in > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164) > > > Key: KYLIN-1810 > URL: https://issues.apache.org/jira/browse/KYLIN-1810 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2.1 >Reporter: Richard Calaba > Attachments: job_2016_06_21_16_23_51-err.zip > > > Hello, > running into weird issue. I have designed Kylin cube. Clonned it to another > cube without any changes and run the Build job. The Build succeeded. Then I > have discarder the build job and disabled and dropped the cube. Clonned the > same cube again (into different name than previously) and then again started > to build the cube. Getting an NPE below every time in Step 4 - Build > Dimension Dictionary": > Kylin > jambajuice_3_0 > Insight > Model > Monitor > System > Help > Welcome, ADMIN > > Jobs > Slow Queries > Cube Name: > Filter ... > Jobs in:NEW PENDING RUNNING FINISHED ERROR DISCARDED > Job Name CubeProgressLast Modified Time Duration > Actions > JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_20160626042000 - > BUILD - PDT 2016-06-21 16:29:03 JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone > ERROR > 2016-06-21 15:35:21 PST 5.80 mins Action > JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_20160626042000 - > BUILD - PDT 2016-06-21 16:06:48 JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone > 100% > 2016-06-21 15:14:24 PST 6.85 mins Action > JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_2016062600 - > BUILD - PDT 2016-06-21 14:10:10 JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone > 14.29% > 2016-06-21 14:59:37 PST 5.70 mins Action > Total: 3 > Detail Information > Job Name JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - > 1970010100_20160626042000 - BUILD - PDT 2016-06-21 16:29:03 > Job IDac090c87-496d-4173-9503-6a9ec97a764e > StatusERROR > Duration 5.80 mins > MapReduce Waiting 0.18 mins > Start 2016-06-21 15:29:32 PST > 2016-06-21 15:29:32 PST > #1 Step Name: Create Intermediate Flat Hive Table > Duration: 2.82 mins > > 2016-06-21 15:32:21 PST > #2 Step Name: Materialize Hive View in Lookup Tables > Duration: 2.11 mins > > 2016-06-21 15:34:28 PST > #3 Step Name: Extract Fact Table Distinct Columns > Duration: 0.86 mins > > 2016-06-21 15:35:20 PST > #4 Step Name: Build Dimension Dictionary > Duration: 0.02 mins > > #5 Step Name: Save Cuboid Statistics > Duration: 0 seconds > #6 Step Name: Create HTable > Duration: 0 seconds > #7 Step Name: Build Base Cuboid Data > Duration: 0 seconds > #8 Step Name: Build N-Dimension Cuboid Data : 8-Dimension > Duration: 0 seconds > #9 Step Name: Build N-Dimension Cuboid Data : 7-Dimension > Duration: 0 seconds > #10 Step Name: Build N-Dimension Cuboid Data : 6-Dimension > Duration: 0 seconds > #11 Step Name: Build N-Dimension Cuboid Data : 5-Dimension > Duration: 0 seconds > #12 Step Name: Build N-Dimension Cuboid Data : 4-Dimension > Duration: 0 seconds > #13 Step Name: Build N-Dimension Cuboid Data : 3-Dimension > Duration: 0 seconds > #14 Step Name: Build N-Dimension Cuboid Data : 2-Dimension > Duration: 0 seconds > #15 Step Name: Build N-Dimension Cuboid Data : 1-Dimension > Duration: 0 seconds > #16 Step Name: Build N-Dimension Cuboid Data : 0-Dimension > Duration: 0 seconds > #17 Step Name: Build Cube > Duration: 0 seconds > #18 Step Name: Convert Cuboid Data to HFile > Duration: 0 seconds > #19 Step Name: Load HFile to HBase Table > Duration: 0 seconds > #20 Step Name: Update Cube Info > Duration: 0 seconds > #21 Step Name: Garbage Collection > Duration: 0 seconds > End > Apache Kylin | Apache Kylin Community > Output > java.lang.NullPointerException > at > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164) > at > org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167) > at > org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at
[jira] [Commented] (KYLIN-1704) When load empty snapshot, NULL Pointer Exception occurs
[ https://issues.apache.org/jira/browse/KYLIN-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334202#comment-15334202 ] Richard Calaba commented on KYLIN-1704: --- Sorry - cannot reproduce anymore - before I was getting the above mentioned NPE in Build Cube Step 3. Now in the same Build Step 3 I am getting another exception: java.lang.IllegalStateException: Dup key found, key=[null], value1=[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,20160324], value2=[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,20160324] at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:83) at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:66) at org.apache.kylin.dict.lookup.LookupTable.(LookupTable.java:54) at org.apache.kylin.dict.lookup.LookupStringTable.(LookupStringTable.java:33) Could be that data has changed meanwhile ... but not 100% sure ... so do not want to confuse you ... Is this Duplicate Lookup table entry checks something new in Kylin 1.5.2.1 ??? Askin as the previous Cube I clonned from was built successfully but on earlier Kylin version ... > When load empty snapshot, NULL Pointer Exception occurs > --- > > Key: KYLIN-1704 > URL: https://issues.apache.org/jira/browse/KYLIN-1704 > Project: Kylin > Issue Type: Bug > Components: Metadata >Affects Versions: v1.5.0, v1.5.1, v1.5.2 >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > Fix For: v1.5.3 > > Attachments: > initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch > > > Error Log: java.lang.NullPointerException > at > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:163) > at > org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:164) > at > org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:125) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:105) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:205) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1704) When load empty snapshot, NULL Pointer Exception occurs
[ https://issues.apache.org/jira/browse/KYLIN-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333448#comment-15333448 ] Richard Calaba commented on KYLIN-1704: --- Ok - will try to reproduce and attach logs, tomorrow - is 2:30am here in California :) > When load empty snapshot, NULL Pointer Exception occurs > --- > > Key: KYLIN-1704 > URL: https://issues.apache.org/jira/browse/KYLIN-1704 > Project: Kylin > Issue Type: Bug > Components: Metadata >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > Attachments: > initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch > > > Error Log: java.lang.NullPointerException > at > org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:163) > at > org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:164) > at > org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:125) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:105) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:205) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1789) Couldn't use View as Lookup when join type is "inner"
[ https://issues.apache.org/jira/browse/KYLIN-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332920#comment-15332920 ] Richard Calaba commented on KYLIN-1789: --- Additional info - also obtained from Bhanu Mohanty - the reporter of the bug: While looking into logs he saw a message: Dup key found, key=[-289271615434074838,-7076210457049756771], value1=[2016-03-24,-289271615434074838,-7076210457049756771,Medium,6,4,,null,null,null,null,null], value2=[2016-03-24,-289271615434074838,-7076210457049756771,Medium,6,4,,null,null,null,null,null] at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:83) That would indicate that the Inner Join expected in the lookup table to find only one record ... if this is true and the Inner Join on Lookup req. one record that it is clear inconsistency (1st the model is defined as FACT INNER JOIN LOOKUP ON key1, key2, ... so this has to allow multiple candidates in lookup (even this is not typical) ; 2nd there is no reason why Left Outer join would accept duplicate key entries in the Right operand and Inner join won't allow it) > Couldn't use View as Lookup when join type is "inner" > - > > Key: KYLIN-1789 > URL: https://issues.apache.org/jira/browse/KYLIN-1789 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v1.5.2, v1.5.2.1 >Reporter: Shaofeng SHI >Assignee: Shaofeng SHI > Fix For: v1.5.3 > > > Reported by Bhanu Mohanty in user mailing list: > I am using kylin-1.5.2.1 > Added hive view as a look up table > Getting error at Build Dimension Dictionary > DEFAULT.kylin_intermediate_DEFAULT_* > If the join is "inner" > It works when I changed the join to "left" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join
[ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330956#comment-15330956 ] Richard Calaba edited comment on KYLIN-1576 at 6/15/16 9:03 PM: Interestingly - I just found out that Apache Drill supports this scenario - at least 2 yars ago they pacthed it to support this if at least one equi join condition is used: https://issues.apache.org/jira/browse/DRILL-485 So questions: 1) Why Hive cannot implement same - seems the argument in the docu doesn't hold anymore ... 2) Should / Can Kyling Cube Build use Apache Drill while builiding the data cubes ??? Update in regards to the 2) - Can Kylin Cube Build use Apache Drill ... seems this is on eBay roadmap ... considering Slide 68 here: http://events.mapr.com/BayAreaApacheDrill and here https://www.slideshare.net/secret/lMvMrzy9mFeyBP/?utm_source=ion_medium=Meetups_campaign=ION_MKT_HUG_ApacheDrill_GA. Morover seems they might implement the idea of running Drill Query in case Kylin Query cannot be evaluated ... that's awesome ... and the same idea we are trying to implement at Fishbowl was (Author: cal...@gmail.com): Interestingly - I just found out that Apache Drill supports this scenario - at least 2 yars ago they pacthed it to support this if at least one equi join condition is used: https://issues.apache.org/jira/browse/DRILL-485 So questions: 1) Why Hive cannot implement same - seems the argument in the docu doesn't hold anymore ... 2) Should / Can Kyling Cube Build use Apache Drill while builiding the data cubes ??? > Support of new join type in the Cube Model - Temporal Join > -- > > Key: KYLIN-1576 > URL: https://issues.apache.org/jira/browse/KYLIN-1576 > Project: Kylin > Issue Type: New Feature > Components: General >Affects Versions: Future >Reporter: Richard Calaba >Priority: Blocker > > There is a notion of time-dependent master data in many business scenarios. > Typically modeled with granularity 1 day (datefrom, dateto fields of type > DATE defining validity time of one master data record). Occasionally you can > think of lower granularity so use of TIMESTAMP can be also seen as an valid > scenario). Example of such master data definition could be: > Master Data / Dimension Table: > = > KEY: PRODUCT_ID, DATE_TO, > NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION > - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is > assumed that DATE_TO <= DATE_TO and also that there are no overlapping > intervals (DATE_FROM, DATE_TO) for all PRODUCT master data > - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the > statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= > today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' > PRODUCT master data (description). The today/now is also being named as 'key > date'. > - now if I have transaction data (FACT table) of product sales, i.e: > SALES_DATE, PRODUCT_ID, STORE_ID, > I would like to show the Sold Products at Store at certain date and also show > the Description of the product at the date of product sale (assuming here > that there is product catalog which can be updated independently, but for > auditing purposes the original product description used during sale is needed > to be displayed/used). > The SQL for the temporal join would be then: > SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION > FROM SALES as S LEFT OUTER JOIN PRODUCT as P > ON S.PRODUCT_ID = P.PRODUCT_ID > AND S.SALES_DATE >= P.DATE_FROM AND > AND S.SALES_DATE <= P.DATE_TO > (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but > in this case it won't be the proper join - we need to show the product sales > even the description wasn't maintained in product master data) > (some more details for temporal joins - see i.e. here - > http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval > ) > This scenario can be supported by Kylin if following enhancement would be > done: > 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER > joins (suggesting name temporal (left outer/inner) join) which forces to > specify a 'key date' as a expression (column / constant / ...) from the FACT > table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP > table/ Those 2 validity fields are defining master data record validity > period. Supported types for those fields should be DATE, optionally TIMESTAMP > is also fine but rarely used in business scenarios. > Other option rather then defining new join type is to loosen the join > condition and allowing <= and >= operands to be
[jira] [Commented] (KYLIN-1786) Frontend work for KYLIN-1313
[ https://issues.apache.org/jira/browse/KYLIN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332470#comment-15332470 ] Richard Calaba commented on KYLIN-1786: --- Hello - is it possible to request change from Minor to Critical on this issue ??? And as [~mahongbin] pointed in https://issues.apache.org/jira/browse/KYLIN-1313 please include on UI option to be able to Derive Dimension also from Fact table !!! That should be supported by backend already. The reason for Critical is: 1) It relates to the https://issues.apache.org/jira/browse/KYLIN-1313 - which is Major priority 2) The lack of this function on UI (Derived dimensions for Fact table) is major Draw-back in the Jira Issue - https://issues.apache.org/jira/browse/KYLIN-1576 - which is trying to resolve functionality gap for so called Time-Dependent attributes. It is possible use some view based workarounds to bring time-dependent attributes as Fact table attributes (calculated by lookup to the dimension table with respect to the date column in fact table) but the lack of being able to define those time-dep. attributes as Derived Dimension causes significant inefficiency while buliding the cube. > Frontend work for KYLIN-1313 > > > Key: KYLIN-1786 > URL: https://issues.apache.org/jira/browse/KYLIN-1786 > Project: Kylin > Issue Type: Improvement > Components: Web >Reporter: Dong Li >Assignee: Zhong,Jason >Priority: Minor > Attachments: 屏幕快照 2016-06-15 12.22.54.png > > > KYLIN-1313 introduced a measure called extendedcolumn, but seems not enabled > on WebUI, see attached screenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join
[ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330891#comment-15330891 ] Richard Calaba edited comment on KYLIN-1576 at 6/15/16 1:04 AM: One workaround - not generic - but solves the case with the temporal join logic which is composed from: - equality join on entity ID (id1, ... id-N) - 2 non-equality joins on entity validity (date_from, date_to) is to define new fact-table which includes the original fact table and time-dependent atributes. So the old join condition in the model: - FROM fact_table LEFT OUTER JOIN time_dependent_attrs ON fact_table.id1 = time_dependent_attrs.id1 (AND fact_table.id-N = time_dependent_attrs.id-N)* AND fact_table.transaction_date <= time_dependent_attrs.date_to AND fact_table.transaction_date >= time_dependent_attrs.date_from you can define new fact table this way to achieve same logic: create table/view fact_table_new AS SELECT fact_table.*, timedep.attr1, timedep.attr2, FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; The draw-back of this solution: You will make all time-dependent attributes (if needed to be used for grouping) as separate Normal dimensions - Kylin cannot utilize the optimized logic for Derived dimensions. So this solution is practical only for small amount of time-dependent attributes. If this JIRA ticket (shown as resilved) - https://issues.apache.org/jira/browse/KYLIN-1313 - would be true ... then you should be able to define those Time-Dep. attributes as part of Derrived Dimension ... but unfortunately I didn't find out how to do it on UI - in Kylin 1.5.2 The 2nd workaround is (instead of creating new fact table) to create new dimension table (lookup table) where you can map the records from t to the keys of the original fact-table: create table/view new_dim_table AS SELECT fact_table.id-1, (fact_table.id-N)*, timedep.* FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; And then you can use Kylin Model to define: fact_table INNER JOIN new_dim_table(you do not have to use LEFT OUTER JOIN here anymore) ON fact_table.id1 = new_dim_table.id1 (AND fact_table.id-N = new_dim_table.id-N)* This way you will get Dim table of same size as Fact tabe -> but you can still utilize Derived dimensions benefits in Kylin. was (Author: cal...@gmail.com): One workaround - not generic - but solves the case with the temporal join logic which is composed from: - equality join on entity ID (id1, ... id-N) - 2 non-equality joins on entity validity (date_from, date_to) is to define new fact-table which includes the original fact table and time-dependent atributes. So the old join condition in the model: - FROM fact_table LEFT OUTER JOIN time_dependent_attrs ON fact_table.id1 = time_dependent_attrs.id1 (AND fact_table.id-N = time_dependent_attrs.id-N)* AND fact_table.transaction_date <= time_dependent_attrs.date_to AND fact_table.transaction_date >= time_dependent_attrs.date_from you can define new fact table this way to achieve same logic: create table/view fact_table_new AS SELECT fact_table.*, timedep.attr1, timedep.attr2, FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; The draw-back of this solution: You will make all time-dependent attributes (if needed to be used for grouping) as separate Normal dimensions - Kylin cannot utilize the optimized logic for Derived dimensions. So this solution is practical only for small amount of time-dependent attributes. The 2nd workaround is (instead of creating new fact table) to create new dimension table (lookup table) where you can map the records from t to the keys of the original fact-table: create table/view new_dim_table AS SELECT fact_table.id-1, (fact_table.id-N)*, timedep.* FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; And then you can
[jira] [Commented] (KYLIN-1313) Enable deriving dimensions on non PK/FK
[ https://issues.apache.org/jira/browse/KYLIN-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330963#comment-15330963 ] Richard Calaba commented on KYLIN-1313: --- Similiar question - today playing 2ith Kylin-1.5.2 - I didn't see anywhere the ability (on UI) to specify that Dimension can be Derrived from Fact table ... where is it ??? > Enable deriving dimensions on non PK/FK > --- > > Key: KYLIN-1313 > URL: https://issues.apache.org/jira/browse/KYLIN-1313 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > Fix For: v1.5.2 > > > currently derived column has to be columns on look table, and the derived > host column has to be PK/FK(It's also a problem when the lookup table grows > every large). Sometimes columns on the fact exhibit deriving relationship > too. Here's an example fact table: > (dt date, seller_id bigint, seller_name varchar(100) , item_id bigint, > item_url varchar(1000), count decimal, price decimal) > seller_name is uniquely determined by each seller id, and item_url is > uniquely determined by each item_id. The users does not expect to do > filtering on columns like seller name or item_url, they just want to retrieve > it when they do grouping/filtering on other dimensions like selller id, item > id or even other dimensions like dt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)