[jira] [Commented] (KYLIN-2106) UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - could possibly impact also cube metadata

2016-10-20 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592904#comment-15592904
 ] 

Richard Calaba commented on KYLIN-2106:
---

Found myself - as nicely described here :) - 
https://stackoverflow.com/questions/21903805/how-to-download-a-single-commit-diff-from-github

To get this patch: 
https://github.com/apache/kylin/commit/1e049817856ede06c7c8736ad1d608765f301a21.patch


> UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - 
> could possibly impact also cube metadata
> -
>
> Key: KYLIN-2106
> URL: https://issues.apache.org/jira/browse/KYLIN-2106
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine, Web 
>Affects Versions: v1.5.4.1
>Reporter: Richard Calaba
>Assignee: Zhong,Jason
>
> I have realized possib,e bug in UI, to reproduce:
> 1) Create Cube which has in Advance Settings - Rowkeys section a encoding of 
> one of the dimension (in my case it was 1st dimension - customer_id) set 
> encoding to 'Integer' - I used length 22. (Do not use the 'Int' Deprecated)
> 2) Save this cube.
> 3) Clone this cube
> 4) The cloned cube had the same dimension in the Rowkeys section in Edit mode 
> marked as 'Int(deprecated)' and the length of the Int dictionary encoding was 
> set to 'ger' - obviously some issue while parsing the new Encoding type - 
> hopefully only on UI ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2106) UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - could possibly impact also cube metadata

2016-10-20 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592660#comment-15592660
 ] 

Richard Calaba commented on KYLIN-2106:
---

Hi, I see - this is in master branch right ?? Can we generate patch for 1.5.4.1 
??? As it is latest released ??? Is there an easy way to generate that patch 
from Git ?? 

> UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - 
> could possibly impact also cube metadata
> -
>
> Key: KYLIN-2106
> URL: https://issues.apache.org/jira/browse/KYLIN-2106
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine, Web 
>Affects Versions: v1.5.4.1
>Reporter: Richard Calaba
>Assignee: Zhong,Jason
>
> I have realized possib,e bug in UI, to reproduce:
> 1) Create Cube which has in Advance Settings - Rowkeys section a encoding of 
> one of the dimension (in my case it was 1st dimension - customer_id) set 
> encoding to 'Integer' - I used length 22. (Do not use the 'Int' Deprecated)
> 2) Save this cube.
> 3) Clone this cube
> 4) The cloned cube had the same dimension in the Rowkeys section in Edit mode 
> marked as 'Int(deprecated)' and the length of the Int dictionary encoding was 
> set to 'ger' - obviously some issue while parsing the new Encoding type - 
> hopefully only on UI ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2106) UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - could possibly impact also cube metadata

2016-10-19 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589628#comment-15589628
 ] 

Richard Calaba commented on KYLIN-2106:
---

In addition - it seems that this bug is not only UI - I am not able to Build a 
cube using 'Integer' dictionary encoding -> it always fails in the Build Step - 
Create HTable - in the log of the Step, there is only:  #result code:2

Cannot provide Diagnostics output - there is another issue which is causing 
DiagnosticsCLI to go to endless loop and flooding kylin.log with errors related 
to some network timeout.

BUT using the same cube and changing the encoding to 'Int(deprecated)' works 
fine -> so definitely some problem with this new ditcionary encoding in BOTH UI 
and Job Engine.


> UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - 
> could possibly impact also cube metadata
> -
>
> Key: KYLIN-2106
> URL: https://issues.apache.org/jira/browse/KYLIN-2106
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine, Web 
>Affects Versions: v1.5.4.1
>Reporter: Richard Calaba
>Assignee: Zhong,Jason
>
> I have realized possib,e bug in UI, to reproduce:
> 1) Create Cube which has in Advance Settings - Rowkeys section a encoding of 
> one of the dimension (in my case it was 1st dimension - customer_id) set 
> encoding to 'Integer' - I used length 22. (Do not use the 'Int' Deprecated)
> 2) Save this cube.
> 3) Clone this cube
> 4) The cloned cube had the same dimension in the Rowkeys section in Edit mode 
> marked as 'Int(deprecated)' and the length of the Int dictionary encoding was 
> set to 'ger' - obviously some issue while parsing the new Encoding type - 
> hopefully only on UI ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2106) UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - could possibly impact also cube metadata

2016-10-19 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-2106:
--
Summary: UI bug - Advanced Settings - Rowkeys - new Integer dictionary 
encoding - could possibly impact also cube metadata  (was: UI bug - Advanced 
Settings - Rowkeys - could possibly impact also cube metadata)

> UI bug - Advanced Settings - Rowkeys - new Integer dictionary encoding - 
> could possibly impact also cube metadata
> -
>
> Key: KYLIN-2106
> URL: https://issues.apache.org/jira/browse/KYLIN-2106
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine, Web 
>Affects Versions: v1.5.4.1
>Reporter: Richard Calaba
>Assignee: Zhong,Jason
>
> I have realized possib,e bug in UI, to reproduce:
> 1) Create Cube which has in Advance Settings - Rowkeys section a encoding of 
> one of the dimension (in my case it was 1st dimension - customer_id) set 
> encoding to 'Integer' - I used length 22. (Do not use the 'Int' Deprecated)
> 2) Save this cube.
> 3) Clone this cube
> 4) The cloned cube had the same dimension in the Rowkeys section in Edit mode 
> marked as 'Int(deprecated)' and the length of the Int dictionary encoding was 
> set to 'ger' - obviously some issue while parsing the new Encoding type - 
> hopefully only on UI ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2106) UI bug - Advanced Settings - Rowkeys - could possibly impact also cube metadata

2016-10-18 Thread Richard Calaba (JIRA)
Richard Calaba created KYLIN-2106:
-

 Summary: UI bug - Advanced Settings - Rowkeys - could possibly 
impact also cube metadata
 Key: KYLIN-2106
 URL: https://issues.apache.org/jira/browse/KYLIN-2106
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Affects Versions: v1.5.4.1
Reporter: Richard Calaba
Assignee: Zhong,Jason


I have realized possib,e bug in UI, to reproduce:

1) Create Cube which has in Advance Settings - Rowkeys section a encoding of 
one of the dimension (in my case it was 1st dimension - customer_id) set 
encoding to 'Integer' - I used length 22. (Do not use the 'Int' Deprecated)

2) Save this cube.

3) Clone this cube

4) The cloned cube had the same dimension in the Rowkeys section in Edit mode 
marked as 'Int(deprecated)' and the length of the Int dictionary encoding was 
set to 'ger' - obviously some issue while parsing the new Encoding type - 
hopefully only on UI ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;

2016-10-18 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584879#comment-15584879
 ] 

Richard Calaba commented on KYLIN-2094:
---

Ok - confirmed - the removal of the kylin-jdbc-*.jar is workaround for this 
bug. Thank you !

> Build Step #3 - java.lang.VerifyError: class 
> com.mapr.fs.proto.Common$ServiceData overrides final method 
> getParserForType.()Lcom/google/protobuf/Parser;
> 
>
> Key: KYLIN-2094
> URL: https://issues.apache.org/jira/browse/KYLIN-2094
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.4.1
> Environment: MapR 4.1
>Reporter: Richard Calaba
>Assignee: Dong Li
> Attachments: job_2016_10_13_20_00_25.zip
>
>
> When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct 
> Columns I am getting an error:
> java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides 
> final method getParserForType.()Lcom/google/protobuf/Parser;
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64)
>   at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
>   at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89)
>   at 
> org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90)
>   at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114)
>   at 
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
>   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>   at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
>   at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:150)
>   at 
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:108)
>   at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:88)
>   at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
>   at 
> 

[jira] [Comment Edited] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;

2016-10-18 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584831#comment-15584831
 ] 

Richard Calaba edited comment on KYLIN-2094 at 10/18/16 8:21 AM:
-

Ok I have removed the kylin-jdbc-*.jar file from $KYLIN_HOME/lib directory and 
restartedthe server - now waiting for the Cube Build to pass through step #3 ...

In regards to the 2nd issue - To be able to query Cubes Build in previous 
version of Kylin 1.5.3 - I had to do the below Coprocessor migration (as 
advised at https://kylin.apache.org/docs15/howto/howto_update_coprocessor.html)

$KYLIN_HOME/bin/kylin.sh 
org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI 
$KYLIN_HOME/lib/kylin-coprocessor-*.jar all




was (Author: cal...@gmail.com):
Ok I have removed the kylin-jdbc-*.jar file from $KYLIN_HOME/lib directory and 
restartedthe server - now waiting for the Cube Build to pass through step #3 ...

In regards to the 2nd issue - To be able to query Cubes Build in previous 
version of Kylin 1.5.3 - I had to do in advance this As advised here 
https://kylin.apache.org/docs15/howto/howto_update_coprocessor.html)

$KYLIN_HOME/bin/kylin.sh 
org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI 
$KYLIN_HOME/lib/kylin-coprocessor-*.jar all



> Build Step #3 - java.lang.VerifyError: class 
> com.mapr.fs.proto.Common$ServiceData overrides final method 
> getParserForType.()Lcom/google/protobuf/Parser;
> 
>
> Key: KYLIN-2094
> URL: https://issues.apache.org/jira/browse/KYLIN-2094
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.4.1
> Environment: MapR 4.1
>Reporter: Richard Calaba
>Assignee: Dong Li
> Attachments: job_2016_10_13_20_00_25.zip
>
>
> When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct 
> Columns I am getting an error:
> java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides 
> final method getParserForType.()Lcom/google/protobuf/Parser;
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64)
>   at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
>   at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89)
>   at 
> org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90)
>   at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114)
>   at 
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
>   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
>   

[jira] [Comment Edited] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;

2016-10-18 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584831#comment-15584831
 ] 

Richard Calaba edited comment on KYLIN-2094 at 10/18/16 8:20 AM:
-

Ok I have removed the kylin-jdbc-*.jar file from $KYLIN_HOME/lib directory and 
restartedthe server - now waiting for the Cube Build to pass through step #3 ...

In regards to the 2nd issue - To be able to query Cubes Build in previous 
version of Kylin 1.5.3 - I had to do in advance this As advised here 
https://kylin.apache.org/docs15/howto/howto_update_coprocessor.html)

$KYLIN_HOME/bin/kylin.sh 
org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI 
$KYLIN_HOME/lib/kylin-coprocessor-*.jar all




was (Author: cal...@gmail.com):
Ok I have removed the kylin-jdbc-*.jar file from $KYLIN_HOME/lib directory and 
restartedthe server - now waiting for the Cube Build to pass through step #3 ...

To be abl eto query Cubes Build in previous version of Kylin 1.5.3 - I had to 
do in advance this As advised here 
https://kylin.apache.org/docs15/howto/howto_update_coprocessor.html)

$KYLIN_HOME/bin/kylin.sh 
org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI 
$KYLIN_HOME/lib/kylin-coprocessor-*.jar all



> Build Step #3 - java.lang.VerifyError: class 
> com.mapr.fs.proto.Common$ServiceData overrides final method 
> getParserForType.()Lcom/google/protobuf/Parser;
> 
>
> Key: KYLIN-2094
> URL: https://issues.apache.org/jira/browse/KYLIN-2094
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.4.1
> Environment: MapR 4.1
>Reporter: Richard Calaba
>Assignee: Dong Li
> Attachments: job_2016_10_13_20_00_25.zip
>
>
> When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct 
> Columns I am getting an error:
> java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides 
> final method getParserForType.()Lcom/google/protobuf/Parser;
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64)
>   at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
>   at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89)
>   at 
> org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90)
>   at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114)
>   at 
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
>   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
>   at 

[jira] [Commented] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;

2016-10-18 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584831#comment-15584831
 ] 

Richard Calaba commented on KYLIN-2094:
---

Ok I have removed the kylin-jdbc-*.jar file from $KYLIN_HOME/lib directory and 
restartedthe server - now waiting for the Cube Build to pass through step #3 ...

To be abl eto query Cubes Build in previous version of Kylin 1.5.3 - I had to 
do in advance this As advised here 
https://kylin.apache.org/docs15/howto/howto_update_coprocessor.html)

$KYLIN_HOME/bin/kylin.sh 
org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI 
$KYLIN_HOME/lib/kylin-coprocessor-*.jar all



> Build Step #3 - java.lang.VerifyError: class 
> com.mapr.fs.proto.Common$ServiceData overrides final method 
> getParserForType.()Lcom/google/protobuf/Parser;
> 
>
> Key: KYLIN-2094
> URL: https://issues.apache.org/jira/browse/KYLIN-2094
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.4.1
> Environment: MapR 4.1
>Reporter: Richard Calaba
>Assignee: Dong Li
> Attachments: job_2016_10_13_20_00_25.zip
>
>
> When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct 
> Columns I am getting an error:
> java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides 
> final method getParserForType.()Lcom/google/protobuf/Parser;
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64)
>   at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
>   at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89)
>   at 
> org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90)
>   at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114)
>   at 
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
>   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>   at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
>   at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:150)
>   at 
> 

[jira] [Commented] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;

2016-10-18 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584803#comment-15584803
 ] 

Richard Calaba commented on KYLIN-2094:
---

Ok - thanx - will try and reconfirm. Another issue is that Cube successfully 
build in Kylin 1.5.3 cannot be queried in 1.5.4.1 (query doens't return result) 
 - I think I saw also some protobuf exceptions in the logs  hopefully it is 
the same issue ... going to check now ...


> Build Step #3 - java.lang.VerifyError: class 
> com.mapr.fs.proto.Common$ServiceData overrides final method 
> getParserForType.()Lcom/google/protobuf/Parser;
> 
>
> Key: KYLIN-2094
> URL: https://issues.apache.org/jira/browse/KYLIN-2094
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.4.1
> Environment: MapR 4.1
>Reporter: Richard Calaba
>Assignee: Dong Li
> Attachments: job_2016_10_13_20_00_25.zip
>
>
> When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct 
> Columns I am getting an error:
> java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides 
> final method getParserForType.()Lcom/google/protobuf/Parser;
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64)
>   at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
>   at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89)
>   at 
> org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90)
>   at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114)
>   at 
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
>   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>   at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
>   at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:150)
>   at 
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:108)
>   at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:88)
>   at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120)
>   at 
> 

[jira] [Commented] (KYLIN-2104) loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/p

2016-10-18 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584777#comment-15584777
 ] 

Richard Calaba commented on KYLIN-2104:
---

FINALLY SEEMS I HAVE A SOLUTION:

Did update apache-maven (previous version Apache Maven 3.0.5 (Red Hat 
3.0.5-16)) -> updated to Apache Maven 3.3.9 
(bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T12:41:47-04:00)

The other working server Maven version was 3.3.3

Have also deleted all old repository cache in home directory for npm (~/.npm) 
and for maven (~/.m2)

Either the repository cache cleanup or the version update resolved my problem . 
After that I am able to build BIN version from sources (for any kylin git tag - 
1.5.3 / 1.5.4.1 / master) and the BIN compiled version works correctly. Also 
the BIN packge size difference (~ 20-30 MBs) is gone.

In addition I was able to return back the NodeJS version 6.7.0 back - and it is 
still working.

Closing ticket as resolved -> most probably maven version or maven repository 
cache issue.

> loader constraint violation: loader (instance of 
> org/apache/catalina/loader/WebappClassLoader) previously initiated loading 
> for a different type with name "com/google/protobuf/ByteString"
> ---
>
> Key: KYLIN-2104
> URL: https://issues.apache.org/jira/browse/KYLIN-2104
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.3
> Environment: MapR 4.1 - Edge node
>Reporter: Richard Calaba
>Priority: Critical
>
> Something very odd is with v.1.5.3 compilation & packaging scripts - it seems 
> that during compilation some req. library is missing or another version is 
> being used and this is not reported as a compilation error which is causing 
> issues later in runtime.
> On my MapR 4.1 system - EDGE node which has all necessary access rights for 
> hbase/hive + other packaging tools installed I did this:
> 1) Followed the https://kylin.apache.org/development/howto_package.html - 
> with one exception - from git I am not clonning latest master branch but 
> specific released Kylin version using tag kylin-1.5.3
> 2) The bin package is compiled successfully without any errors being reported 
> (I believe test cases are skipped this way - so cannot say test cases run ok)
> 3) I then installed the successfully compiled Kylin 1.5.3 from sources and 
> run Kylin - all seems OK.
> 4) I defined and successfully build 2 cubes - no issues during the build 
> process. (Maybe except the fact that Cube size is reported to be 0 Kb on UI 
> having approx. 350 million rows processed during Build -> that looks more 
> like some other bug).
> 5) If I go to Insights tab in Kylin UI and run any query which should return 
> some data (350 mil. rows processed during build) I am getting an error:
> a) 1st time I run any query - ERROR: loader constraint violation: loader 
> (instance of org/apache/catalina/loader/WebappClassLoader) previously 
> initiated loading for a different type with name 
> "com/google/protobuf/ByteString"
> b) 2nd and later times - ERROR says only "com/google/protobuf/ByteString"
> 6) If I STOP the Kylin -> replace the whole binary installation with the 
> officialy released binary package of Kylin 1.5.3 (for HBase 0.98/0.99) - I 
> can run my queries without any issue 
> The reason why I am reporting this bug on v 1.5.3 and not on latest released 
> sources 1.5.4.1 is that I have issues to have 1.5.4.1 working - see 
> https://issues.apache.org/jira/browse/KYLIN-2094 - Bin release fails in step 
> #3 of the Build process and 1.5.4.1 compiled from sources doesn't work for 
> me. 
> All points to some issues with incorrect dependencies being detected during 
> compilation and/or runtime ... maybe related to Google's Protobuffers ...??? 
> Anyone has any idea how to debug this problem ?? Basically it makes both 
> 1.5.3 and 1.5.4.1 not working on my system.
> On different system (also MapR 4.1) few months back -> I didn't have those 
> issues -I was able successfully re-compile sources of 1.5.x versions 
> including some additional patches relased for them. 
> Beacuse no errror is reported during the Kylin compilation & packaging 
> process -> all indicates that there is some strange non-resolved dependency 
> which was OK on my previous MapR system but is different on my current MapR 
> system. Could be anything ...
> I will try to attach the compiled binary package here so some guru can have a 
> look and let me know why "successfully" compiled Kylin from sources doesn't 
> run same as the original BIN release. (BTW my compiled archive is several 30 
> MBs larger than the released binary package ...)



--
This message was sent by Atlassian JIRA

[jira] [Comment Edited] (KYLIN-2104) loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/goo

2016-10-18 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584626#comment-15584626
 ] 

Richard Calaba edited comment on KYLIN-2104 at 10/18/16 6:35 AM:
-

Anyway - this doesn't resolve anything - If I use my own compiled kylin from 
source-code, after trying to run a Query - I am getting in kylin.log:

1) ===> FIRST ATTEMPT TO RUN QUERY


==[QUERY]===
SQL: select ti.business_date_year, ti.business_date_month, sum(pos_netsales)
from mdl_pos_item as fact
left outer join mdl_timeinfo as ti on ti.business_date = fact.business_date
where fact.business_date between '2016-01-01' and '2016-12-31'
group by ti.business_date_year, ti.business_date_month
order by ti.business_date_year ASC, ti.business_date_month ASC
User: ADMIN
Success: false
Duration: 0.0
Project: jambajuice_3_0
Realization Names: [jambajuice_3_0_MDL_POS_ITEM]
Cuboid Ids: [34]
Total scan count: 0
Result row count: 0
Accept Partial: true
Is Partial Result: false
Hit Exception Cache: false
Storage cache used: false
Message: loader constraint violation: loader (instance of 
org/apache/catalina/loader/WebappClassLoader) previously initiated loading for 
a different type with name "com/google/protobuf/ByteString"
==[QUERY]===

2016-10-18 02:34:25,077 ERROR [http-bio-7070-exec-6] 
controller.BasicController:44 :
org.apache.kylin.rest.exception.InternalErrorException: loader constraint 
violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) 
previously initiated loading for a different type with name 
"com/google/protobuf/ByteString"
at 
org.apache.kylin.rest.controller.QueryController.doQueryWithCache(QueryController.java:224)
at 
org.apache.kylin.rest.controller.QueryController.query(QueryController.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213)
at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126)
at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
at 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at 
org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at 
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at 
org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at 

[jira] [Commented] (KYLIN-2104) loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/p

2016-10-18 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584626#comment-15584626
 ] 

Richard Calaba commented on KYLIN-2104:
---

Anyway - this doesn't resolve anything - If I use my own compiled kylin from 
source-code, after trying to run a Query - I am getting in kylin.log:

==[QUERY]===
SQL: select ti.business_date_year, ti.business_date_month, sum(pos_netsales)
from mdl_pos_item as fact
left outer join mdl_timeinfo as ti on ti.business_date = fact.business_date
where fact.business_date between '2016-01-01' and '2016-12-31'
group by ti.business_date_year, ti.business_date_month
order by ti.business_date_year ASC, ti.business_date_month ASC
User: ADMIN
Success: false
Duration: 0.0
Project: jambajuice_3_0
Realization Names: [jambajuice_3_0_MDL_POS_ITEM]
Cuboid Ids: [34]
Total scan count: 0
Result row count: 0
Accept Partial: true
Is Partial Result: false
Hit Exception Cache: false
Storage cache used: false
Message: com/google/protobuf/ByteString
==[QUERY]===

2016-10-18 02:23:49,232 ERROR [http-bio-7070-exec-3] 
controller.BasicController:44 :
org.apache.kylin.rest.exception.InternalErrorException: 
com/google/protobuf/ByteString
at 
org.apache.kylin.rest.controller.QueryController.doQueryWithCache(QueryController.java:224)
at 
org.apache.kylin.rest.controller.QueryController.query(QueryController.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213)
at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126)
at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
at 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at 
org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at 
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at 
org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at 
org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:103)
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at 
org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
at 

[jira] [Commented] (KYLIN-2104) loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/p

2016-10-17 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584255#comment-15584255
 ] 

Richard Calaba commented on KYLIN-2104:
---

As I cannot attach too large files here, then you can download the BIN relases 
from here:

1) ORIGINAL Kylin 1.5.3 - 
https://archive.apache.org/dist/kylin/apache-kylin-1.5.3/apache-kylin-1.5.3-bin.tar.gz

2) My version of Kylin 1.5.3 - gitc lone with tag kylin-1.5.3 and compilation & 
packaging from sources - no errors reported - 
https://drive.google.com/file/d/0Bz5GkHbD3o7KcFhIUVh2LWhCeDg/view?usp=sharing

> loader constraint violation: loader (instance of 
> org/apache/catalina/loader/WebappClassLoader) previously initiated loading 
> for a different type with name "com/google/protobuf/ByteString"
> ---
>
> Key: KYLIN-2104
> URL: https://issues.apache.org/jira/browse/KYLIN-2104
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.3
> Environment: MapR 4.1 - Edge node
>Reporter: Richard Calaba
>Priority: Critical
>
> Something very odd is with v.1.5.3 compilation & packaging scripts - it seems 
> that during compilation some req. library is missing or another version is 
> being used and this is not reported as a compilation error which is causing 
> issues later in runtime.
> On my MapR 4.1 system - EDGE node which has all necessary access rights for 
> hbase/hive + other packaging tools installed I did this:
> 1) Followed the https://kylin.apache.org/development/howto_package.html - 
> with one exception - from git I am not clonning latest master branch but 
> specific released Kylin version using tag kylin-1.5.3
> 2) The bin package is compiled successfully without any errors being reported 
> (I believe test cases are skipped this way - so cannot say test cases run ok)
> 3) I then installed the successfully compiled Kylin 1.5.3 from sources and 
> run Kylin - all seems OK.
> 4) I defined and successfully build 2 cubes - no issues during the build 
> process. (Maybe except the fact that Cube size is reported to be 0 Kb on UI 
> having approx. 350 million rows processed during Build -> that looks more 
> like some other bug).
> 5) If I go to Insights tab in Kylin UI and run any query which should return 
> some data (350 mil. rows processed during build) I am getting an error:
> a) 1st time I run any query - ERROR: loader constraint violation: loader 
> (instance of org/apache/catalina/loader/WebappClassLoader) previously 
> initiated loading for a different type with name 
> "com/google/protobuf/ByteString"
> b) 2nd and later times - ERROR says only "com/google/protobuf/ByteString"
> 6) If I STOP the Kylin -> replace the whole binary installation with the 
> officialy released binary package of Kylin 1.5.3 (for HBase 0.98/0.99) - I 
> can run my queries without any issue 
> The reason why I am reporting this bug on v 1.5.3 and not on latest released 
> sources 1.5.4.1 is that I have issues to have 1.5.4.1 working - see 
> https://issues.apache.org/jira/browse/KYLIN-2094 - Bin release fails in step 
> #3 of the Build process and 1.5.4.1 compiled from sources doesn't work for 
> me. 
> All points to some issues with incorrect dependencies being detected during 
> compilation and/or runtime ... maybe related to Google's Protobuffers ...??? 
> Anyone has any idea how to debug this problem ?? Basically it makes both 
> 1.5.3 and 1.5.4.1 not working on my system.
> On different system (also MapR 4.1) few months back -> I didn't have those 
> issues -I was able successfully re-compile sources of 1.5.x versions 
> including some additional patches relased for them. 
> Beacuse no errror is reported during the Kylin compilation & packaging 
> process -> all indicates that there is some strange non-resolved dependency 
> which was OK on my previous MapR system but is different on my current MapR 
> system. Could be anything ...
> I will try to attach the compiled binary package here so some guru can have a 
> look and let me know why "successfully" compiled Kylin from sources doesn't 
> run same as the original BIN release. (BTW my compiled archive is several 30 
> MBs larger than the released binary package ...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-2104) loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name "com/google/pro

2016-10-17 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-2104:
--
Description: 
Something very odd is with v.1.5.3 compilation & packaging scripts - it seems 
that during compilation some req. library is missing or another version is 
being used and this is not reported as a compilation error which is causing 
issues later in runtime.

On my MapR 4.1 system - EDGE node which has all necessary access rights for 
hbase/hive + other packaging tools installed I did this:

1) Followed the https://kylin.apache.org/development/howto_package.html - with 
one exception - from git I am not clonning latest master branch but specific 
released Kylin version using tag kylin-1.5.3

2) The bin package is compiled successfully without any errors being reported 
(I believe test cases are skipped this way - so cannot say test cases run ok)

3) I then installed the successfully compiled Kylin 1.5.3 from sources and run 
Kylin - all seems OK.

4) I defined and successfully build 2 cubes - no issues during the build 
process. (Maybe except the fact that Cube size is reported to be 0 Kb on UI 
having approx. 350 million rows processed during Build -> that looks more like 
some other bug).

5) If I go to Insights tab in Kylin UI and run any query which should return 
some data (350 mil. rows processed during build) I am getting an error:

a) 1st time I run any query - ERROR: loader constraint violation: loader 
(instance of org/apache/catalina/loader/WebappClassLoader) previously initiated 
loading for a different type with name "com/google/protobuf/ByteString"

b) 2nd and later times - ERROR says only "com/google/protobuf/ByteString"

6) If I STOP the Kylin -> replace the whole binary installation with the 
officialy released binary package of Kylin 1.5.3 (for HBase 0.98/0.99) - I can 
run my queries without any issue 

The reason why I am reporting this bug on v 1.5.3 and not on latest released 
sources 1.5.4.1 is that I have issues to have 1.5.4.1 working - see 
https://issues.apache.org/jira/browse/KYLIN-2094 - Bin release fails in step #3 
of the Build process and 1.5.4.1 compiled from sources doesn't work for me. 

All points to some issues with incorrect dependencies being detected during 
compilation and/or runtime ... maybe related to Google's Protobuffers ...??? 
Anyone has any idea how to debug this problem ?? Basically it makes both 1.5.3 
and 1.5.4.1 not working on my system.

On different system (also MapR 4.1) few months back -> I didn't have those 
issues -I was able successfully re-compile sources of 1.5.x versions including 
some additional patches relased for them. 

Beacuse no errror is reported during the Kylin compilation & packaging process 
-> all indicates that there is some strange non-resolved dependency which was 
OK on my previous MapR system but is different on my current MapR system. Could 
be anything ...

I will try to attach the compiled binary package here so some guru can have a 
look and let me know why "successfully" compiled Kylin from sources doesn't run 
same as the original BIN release. (BTW my compiled archive is several 30 MBs 
larger than the released binary package ...)


  was:
Something very odd is with v.1.5.3 compilation & packaging scripts - it seems 
that during compilation some req. library is missing or another version is 
being used and this is not reported as a compilation error which is causing 
issues later in runtime.

On my MapR 4.1 system - EDGE node which has all necessary access rights for 
hbase/hive + other packaging tools installed I did this:

1) Followed the https://kylin.apache.org/development/howto_package.html - with 
one exception - from git I am not clonning latest master branch but specific 
released Kylin version using tag kylin-1.5.3

2) The bin package is compiled successfully without any errors being reported 
(I believe test cases are skipped this way - so cannot say test cases run ok)

3) I then installed the successfully compiled Kylin 1.5.3 from sources and run 
Kylin - all seems OK.

4) I defined and successfully build 2 cubes - no issues during the build 
process. (Maybe except the fact that Cube size is reported to be 0 Kb on UI 
having approx. 350 million rows processed during Build -> that looks more like 
some other bug).

5) If I go to Insights tab in Kylin UI and run any query which should return 
some data (350 mil. rows processed during build) I am getting an error:

a) 1st time I run any query - ERROR: loader constraint violation: loader 
(instance of org/apache/catalina/loader/WebappClassLoader) previously initiated 
loading for a different type with name "com/google/protobuf/ByteString"

b) 2nd and later times - ERROR says only "com/google/protobuf/ByteString"

6) If I STOP the Kylin -> replace the whole binary installation with the 
officialy released binary package of Kylin 1.5.3 (for HBase 

[jira] [Updated] (KYLIN-2094) Build Step #3 - java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides final method getParserForType.()Lcom/google/protobuf/Parser;

2016-10-13 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-2094:
--
Attachment: job_2016_10_13_20_00_25.zip

> Build Step #3 - java.lang.VerifyError: class 
> com.mapr.fs.proto.Common$ServiceData overrides final method 
> getParserForType.()Lcom/google/protobuf/Parser;
> 
>
> Key: KYLIN-2094
> URL: https://issues.apache.org/jira/browse/KYLIN-2094
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.4.1
> Environment: MapR 4.1
>Reporter: Richard Calaba
>Assignee: Dong Li
> Attachments: job_2016_10_13_20_00_25.zip
>
>
> When running Cube Build - in Step 3 #3 Step Name: Extract Fact Table Distinct 
> Columns I am getting an error:
> java.lang.VerifyError: class com.mapr.fs.proto.Common$ServiceData overrides 
> final method getParserForType.()Lcom/google/protobuf/Parser;
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.addServiceDataToMasterMap(ZKDataRetrieval.java:773)
>   at 
> com.mapr.util.zookeeper.ZKDataRetrieval.getServiceMasterData(ZKDataRetrieval.java:284)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.updateCurrentRMAddress(MapRZKBasedRMFailoverProxyProvider.java:100)
>   at 
> org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider.getProxy(MapRZKBasedRMFailoverProxyProvider.java:174)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64)
>   at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
>   at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:89)
>   at 
> org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.serviceStart(ResourceMgrDelegate.java:101)
>   at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:90)
>   at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:114)
>   at 
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
>   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:96)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:83)
>   at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:76)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
>   at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>   at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
>   at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:150)
>   at 
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:108)
>   at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:88)
>   at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:120)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
>   at 
> 

[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-27 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396108#comment-15396108
 ] 

Richard Calaba commented on KYLIN-1834:
---

Hmmm, strange - let me try to reproduce and provide the exact cube metadata so 
you can import it and look at it.

Did you try with Kylin 1.5.2.1 or used latest Kylin sources from git ?

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected 
> dimension/row key Encoding from "dict" to "int; length=8" on the Advanced 
> Settings of the Cube.
> 

[jira] [Commented] (KYLIN-1886) When the calculation on measures is supported on Kylin?

2016-07-13 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375763#comment-15375763
 ] 

Richard Calaba commented on KYLIN-1886:
---

I would like to weight in as well, some extensions which would be helpful:

  - Here - Custome Aggregation Types - 
https://issues.apache.org/jira/browse/KYLIN-976 - if we can have generic basic 
expression aggregation as part of Kylin standard (and supported on Kylin UI) we 
might not need to do workaround to use the calculated KPIs in the views on top 
of our fact/lookup tables

 - This should allow to specify basic arithmetic expressions (+,-,*,/,  mod, 
div) - or even more advanced  
- and should be able to support also conditional CASTing ... as requested here 
https://issues.apache.org/jira/browse/KYLIN-976 - i.e. COUNT(CASE WHEN so.ft = 
'fv' THEN soi.sc ELSE NULL END) or Sum(if...))

   - to give an example -> I can do the SUM(a+b) as SUM(a) + SUM(b) now in 
Kylin and SQL - but I cannot do SUM(a/b) this way and I have to define it on 
the view level so it is applied to every row before it gets aggregated into 
Kylin KPI (Measure)
  
>From https://issues.apache.org/jira/browse/KYLIN-976 - I read there is a way 
>coders can provide custom aggregation types, didn't have time to check and 
>test this approach - but maybe this is the way how we can achieve it  
>generic extension for Kylin can be then:

- having a option on UI to define calculated measure (inputs are: 
expression string ; set of required measures/domains (used in the expression) ; 
and jar file with customer aggregation implementation)
  
- The implementation can then read values from other measures/domains on the 
row (passed by Kylin Cube Build engine) - yes, we might need to resolve problem 
with loops between calculated measures ; maybe we allow to read only already 
defined measures (so the order of measure definition will become important)

Once we have this -> I am pretty sure someone will quickly implement generic 
expression parser and evaluator and Kylin can be easily enhanced with 
calculated fields

> When the calculation on measures is supported on Kylin?
> ---
>
> Key: KYLIN-1886
> URL: https://issues.apache.org/jira/browse/KYLIN-1886
> Project: Kylin
>  Issue Type: Test
>Reporter: Rahul Choubey
>
> Suppose we have two measures and we want to do some calculation on the top of 
> these measures instead of doing it on the fly in the sql query after the cube 
> got build? Currently it does not supported in Kylin and in which version we 
> are planning to have these feature?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1786) Frontend work for KYLIN-1313 (extended columns as measure)

2016-07-13 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375157#comment-15375157
 ] 

Richard Calaba commented on KYLIN-1786:
---

Great, is there a chance to generate patch for 1.5.2.1 so I can test it on 
latest released version ?

> Frontend work for KYLIN-1313 (extended columns as measure)
> --
>
> Key: KYLIN-1786
> URL: https://issues.apache.org/jira/browse/KYLIN-1786
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Reporter: Dong Li
>Assignee: Zhong,Jason
> Fix For: v1.5.3
>
> Attachments: 屏幕快照 2016-06-15 12.22.54.png
>
>
> KYLIN-1313 introduced a measure called extendedcolumn, but seems not enabled 
> on WebUI, see attached screenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-12 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372226#comment-15372226
 ] 

Richard Calaba commented on KYLIN-1834:
---

Thanx, enjoy your vacation :). If you have issues to reproduce the bug, let me 
know.

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected 
> dimension/row key Encoding from "dict" to "int; length=8" on the Advanced 
> Settings of the Cube.
> ==
> We have 2 high-cardinality fields (one from fact table and 

[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-11 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370214#comment-15370214
 ] 

Richard Calaba commented on KYLIN-1834:
---

Ok, I have uploaded the 50MB csv (compressed to 15MB by 7z and 20mb by bzip2) 
file with customer id (bigint) to my drive here: 

7z: 
https://drive.google.com/file/d/0Bz5GkHbD3o7Kd2JfUmZtYjNMX0k/view?usp=sharing
bzip2: 
https://drive.google.com/file/d/0Bz5GkHbD3o7KRXZlYUdHWW85RlU/view?usp=sharing

It should contain 13 645 863 bigints. I didn't check the count. Seems the last 
byte of the file is Hex 1A (EOF) - thus the last bigint in the CSV might not 
read correctly -> maybe filter it out.

To make it working in Kylin I had to increase max. snapshot size (with all the 
fields it was over 700MB) - if you use only the Bigints - might be little less 
... 

To allow processing of high-cardinality customer table in Kylin 1.5.2.1 I did 
this:

1) In conf/kylin.properties added kylin.table.snapshot.max_mb=2048
2) In bin/setnev.sh - set the KYLIN_JVM_PROPERTIES=-Xmx16g (original 4096m 
wasn't enought, 8g should do)

count: 13645863







> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is 

[jira] [Comment Edited] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-10 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369901#comment-15369901
 ] 

Richard Calaba edited comment on KYLIN-1834 at 7/10/16 7:08 PM:


Customer ID -s the 1st field - BIGINT:

-9223372036854775808< -2857007631392161431 < 9223372036854775807
 BIGINT MINBIGINT MAX

-9223372036854775808
-2857007631392161431 

"-2857007631392161431" is perfrctly fine BigInt value ...

In addition I was getting same exception reporting positive bigint number as 
invalid as well.



was (Author: cal...@gmail.com):
Customer ID -s the 1st field - BIGINT:

-9223372036854775808< -2857007631392161431 < 9223372036854775807
 BIGINT MINBIGINT MAX

-9223372036854775808
-2857007631392161431 

"-2857007631392161431" is perfrctly fine BigInt value ...



> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> 

[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-10 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369901#comment-15369901
 ] 

Richard Calaba commented on KYLIN-1834:
---

Customer ID -s the 1st field - BIGINT:

-9223372036854775808< -2857007631392161431 < 9223372036854775807
 BIGINT MINBIGINT MAX

-9223372036854775808
-2857007631392161431 

"-2857007631392161431" is perfrctly fine BigInt value ...



> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected 
> dimension/row key Encoding 

[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-09 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369163#comment-15369163
 ] 

Richard Calaba commented on KYLIN-1834:
---

This is the record in lookup table which get's the Value not found exception in 
the log for value: -2857007631392161431 (customer_id - bigint)

This is the record from the source lookup table - nothing special only lot of 
NULLs:

Col-Types: BIGINT_TYPE, STRING_TYPE, INT_TYPE, INT_TYPE, INT_TYPE, INT_TYPE, 
INT_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, 
STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, 
STRING_TYPE, STRING_TYPE, STRING_TYPE, BIGINT_TYPE, STRING_TYPE, STRING_TYPE, 
STRING_TYPE, STRING_TYPE, STRING_TYPE, STRING_TYPE, BIGINT_TYPE, STRING_TYPE

Col-Values: -2857007631392161431
9526ea3e-1359-45db-b872-8c47a9df3e46-28570076313921614311   1   
NULLNULLNULLNULLNULLNULLNULLJoe NULL
xx...@yahoo.com Joe Joe Joe DoeDoeD NULLNULL80040   NULL
NULLNULLNULLNULLNULLNULLNULLNULLNULL
2016-06-23



> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> 

[jira] [Commented] (KYLIN-1827) Send mail notification when runtime exception throws during build/merge cube

2016-07-08 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368459#comment-15368459
 ] 

Richard Calaba commented on KYLIN-1827:
---

Ok, thanx, I have successfully tested this - but found one problem, the subject 
line of the email comes as:

[ERROR] - [envName] - [projectName] - 
test_JAMBAJUICE_3_0_REL_TRX_POS_CHECK_W_TDC

-- not sure where the envName should be filled  
-- also the projectName is not correctly resolved to my project name 


- otherwise the rest looks fine - would be cool to have a URL link to go to the 
UI / Step which failed (nice to have) :)

So is there any settings for those notification templates ??

> Send mail notification when runtime exception throws during build/merge cube
> 
>
> Key: KYLIN-1827
> URL: https://issues.apache.org/jira/browse/KYLIN-1827
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.1, v1.5.2
>Reporter: Ma Gang
>Assignee: Ma Gang
>
> Currently mail notification is only sent in the onExecuteFinished() method, 
> but no notification will be sent when RuntimeException throws, that may cause 
> user miss some important job build failure, especially for some automation 
> merge jobs.  Sometimes job state update fails(the hbase metastore is 
> unavailable in a short time), it will make the job always look like in a 
> running state, but actually it is failed, should send mail to notify user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1844) Hold huge dictionary in 2nd storage like disk/hbase

2016-07-08 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368321#comment-15368321
 ] 

Richard Calaba commented on KYLIN-1844:
---

Wasn't there in previous versions of Kylin an option to switch off the encoding 
completely ?? I believe I saw some old discussion on this topic ... 

Seems I have to encoude dimension always - and Dict is default.

> Hold huge dictionary in 2nd storage like disk/hbase
> ---
>
> Key: KYLIN-1844
> URL: https://issues.apache.org/jira/browse/KYLIN-1844
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v1.2, v1.5.2
>Reporter: Abhilash L L
>Assignee: liyang
>
> A whole dimension is kept in memory.
> We should have a way to keep only certain number / size of total rows to be 
> kept in memory. A LRU cache for rows in the dimension will help keep memory 
> in check.
> Why not store all the dimensions data in hbase in a different table with a 
> prefix of dimensionid, and all calls to the dimensions (get based on dim 
> key), is mapped to hbase.
> This does mean it will cost more time on a miss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)

2016-07-08 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368316#comment-15368316
 ] 

Richard Calaba edited comment on KYLIN-1835 at 7/8/16 7:53 PM:
---

[~liyang.g...@gmail.com]: You mean I have to map the BigInt ID to Int i.e. 
using view on top of the lookup ??? 

To achieve it I would have to sort all the values and assign to it a row number 
i.e. -> this way I know I have mapping without collisions ... any other faster 
/better way  ??

And I would have to do the view on both fact and lookup.


was (Author: cal...@gmail.com):
[~liyang.g...@gmail.com]: You mean I have to map the BigInt ID to Int i.e. 
using view on top of the lookup ??? 

To achieve it I would have to sort all the values and assign to it a row number 
i.e. -> this way I know I have mapping without collisions ... any other faster 
/better way  ??

> Error: java.lang.NumberFormatException: For input count_distinct on Big Int 
> ??? (#7 Step Name: Build Base Cuboid Data)
> --
>
> Key: KYLIN-1835
> URL: https://issues.apache.org/jira/browse/KYLIN-1835
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Minor
>
> I believe I have discovered an error in Kylin realted to count_distinc with 
> exact precission.
> I am not 100% sure - but all points to the fact tha there is a design limit 
> for count_distinct ... please assess / confirm / reject my observation.
> Background info:
> =
> - large fact table ~ 100 mio rows.
> - large customer dimension ~ 10 mio rows
> Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type 
> bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for 
> one measure max 15 000 000 distinct values ; 2nd measure can have more 
> distinct values ~ approx. 50 mil (just an estimate). 
> Error info:
> 
> Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it 
> errors out without further details in Kylin Log - it shows only "no counters 
> for job job_1463699962519_16085".
> The MR Logs of the job job_1463699962519_16085 sow exceptions:
> 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.NumberFormatException: For input string: 
> "-6628245177096591402"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:495)
>   at java.lang.Integer.parseInt(Integer.java:527)
>   at 
> org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206)
>   at 
> org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Just reading the signature of the exception and connecting the Measure 
> precision return type "bitmap" => looks like that because I have chosen exact 
> precision (which on UI says supported for int types) is causing this 
> exception because I am passing Bigint field  
> If so -> is that a bug (refactory for big int needed) or is it design 
> limitation ??? Cannot be the count_distinct implemented for bigint (with 
> exact precision) or do I have to use count_distinct with error rate instead 
> ???
> In case I do not need to calculate the count_distinct for all dimensions 
> combinations -  I might add some mandatory dimensions to the aggregation 
> group - but not sure if this would resolve this issue (assuming I keep the 
> exact precision counts) ... ???



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)

2016-07-08 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368316#comment-15368316
 ] 

Richard Calaba commented on KYLIN-1835:
---

[~liyang.g...@gmail.com]: You mean I have to map the BigInt ID to Int i.e. 
using view on top of the lookup ??? 

To achieve it I would have to sort all the values and assign to it a row number 
i.e. -> this way I know I have mapping without collisions ... any other faster 
/better way  ??

> Error: java.lang.NumberFormatException: For input count_distinct on Big Int 
> ??? (#7 Step Name: Build Base Cuboid Data)
> --
>
> Key: KYLIN-1835
> URL: https://issues.apache.org/jira/browse/KYLIN-1835
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Minor
>
> I believe I have discovered an error in Kylin realted to count_distinc with 
> exact precission.
> I am not 100% sure - but all points to the fact tha there is a design limit 
> for count_distinct ... please assess / confirm / reject my observation.
> Background info:
> =
> - large fact table ~ 100 mio rows.
> - large customer dimension ~ 10 mio rows
> Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type 
> bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for 
> one measure max 15 000 000 distinct values ; 2nd measure can have more 
> distinct values ~ approx. 50 mil (just an estimate). 
> Error info:
> 
> Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it 
> errors out without further details in Kylin Log - it shows only "no counters 
> for job job_1463699962519_16085".
> The MR Logs of the job job_1463699962519_16085 sow exceptions:
> 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.NumberFormatException: For input string: 
> "-6628245177096591402"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:495)
>   at java.lang.Integer.parseInt(Integer.java:527)
>   at 
> org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206)
>   at 
> org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Just reading the signature of the exception and connecting the Measure 
> precision return type "bitmap" => looks like that because I have chosen exact 
> precision (which on UI says supported for int types) is causing this 
> exception because I am passing Bigint field  
> If so -> is that a bug (refactory for big int needed) or is it design 
> limitation ??? Cannot be the count_distinct implemented for bigint (with 
> exact precision) or do I have to use count_distinct with error rate instead 
> ???
> In case I do not need to calculate the count_distinct for all dimensions 
> combinations -  I might add some mandatory dimensions to the aggregation 
> group - but not sure if this would resolve this issue (assuming I keep the 
> exact precision counts) ... ???



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1863) Discard the Jobs while Droping/Purging the Cube

2016-07-08 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1863:
--
Description: 
I have observed that following scenario on UI leaves uncleaned meta-data in 
Kylin:

1) I have an error status job in Monitor for my Cube. I drop the cube from UI. 
I still see the error status jobs in Monitor after Dropping the Cube. If I try 
to Discard the job -> I am getting NPE. Didn't test the same if Purge used 
instead of Drop - but this needs to be checked as well.

2) Not 100% sure - but I have a feeling that if I Drop cube from UI before 
Purging it 1st - some job execution metadata (finished build jobs) stay in the 
system ... (intermediate tables/HDFS folders/...). It is hard to find a prove 
now when my system is polluted with old job executions. This could be checked 
while working on 1) above.

  was:
I have observed that following scenario on UI leaves uncleaned meta-data in 
Kylin:

1) I have a error status job in Monitor for my Cube. I drop the cube from UI. I 
still see the error status jobs in Monitor after Dropping the Cube. If I try to 
Discard the job -> I am getting NPE. Didn't test the same if Purge used instead 
of Drop - but this needs to be checked as well.

2) Not 100% sure - but I have a feeling that if I Drop cube from UI before 
Purging it 1st - some job execution metadata (finished build jobs) stay in the 
system ... (intermediate tables/HDFS folders/...). It is hard to find a prove 
now when my system is polluted with old job executions. This could be checked 
while working on 1) above.


> Discard the Jobs while Droping/Purging the Cube
> ---
>
> Key: KYLIN-1863
> URL: https://issues.apache.org/jira/browse/KYLIN-1863
> Project: Kylin
>  Issue Type: Bug
>Reporter: Richard Calaba
> Fix For: all, v1.5.2, v1.5.2.1
>
>
> I have observed that following scenario on UI leaves uncleaned meta-data in 
> Kylin:
> 1) I have an error status job in Monitor for my Cube. I drop the cube from 
> UI. I still see the error status jobs in Monitor after Dropping the Cube. If 
> I try to Discard the job -> I am getting NPE. Didn't test the same if Purge 
> used instead of Drop - but this needs to be checked as well.
> 2) Not 100% sure - but I have a feeling that if I Drop cube from UI before 
> Purging it 1st - some job execution metadata (finished build jobs) stay in 
> the system ... (intermediate tables/HDFS folders/...). It is hard to find a 
> prove now when my system is polluted with old job executions. This could be 
> checked while working on 1) above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1863) Discard the Jobs while Droping/Purging the Cube

2016-07-08 Thread Richard Calaba (JIRA)
Richard Calaba created KYLIN-1863:
-

 Summary: Discard the Jobs while Droping/Purging the Cube
 Key: KYLIN-1863
 URL: https://issues.apache.org/jira/browse/KYLIN-1863
 Project: Kylin
  Issue Type: Bug
Reporter: Richard Calaba
 Fix For: all, v1.5.2.1, v1.5.2


I have observed that following scenario on UI leaves uncleaned meta-data in 
Kylin:

1) I have a error status job in Monitor for my Cube. I drop the cube from UI. I 
still see the error status jobs in Monitor after Dropping the Cube. If I try to 
Discard the job -> I am getting NPE. Didn't test the same if Purge used instead 
of Drop - but this needs to be checked as well.

2) Not 100% sure - but I have a feeling that if I Drop cube from UI before 
Purging it 1st - some job execution metadata (finished build jobs) stay in the 
system ... (intermediate tables/HDFS folders/...). It is hard to find a prove 
now when my system is polluted with old job executions. This could be checked 
while working on 1) above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-08 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367908#comment-15367908
 ] 

Richard Calaba commented on KYLIN-1834:
---

Hello liyang,

thaak you for the hints. I do not think the snapshot was modified  and if 
then that has to be another bug in Kylin. We are not loading any data to those 
tables anymore. I have tested the same scenario several times on several cubes 
(sharing the model). I was running into same issue when using both views and 
tables for fact and dimension/lookup tables. I was also running into same issue 
while builidng only one cube at a time or building several cubes at a time 
(based on same model).

I was several times not only restarting the failed step but running the cube 
build again from scratch.

Do I assume correctly that the Value not found ! exception si related only to 
the lookup table content ??? So values in fact table (for that dimension) are 
not involved, right ?? If so -> then I can be pretty sure the lookup didn't 
change and the error is still coming. 

So overall I am 99.9% positive that this (added value to the snapshot table of 
the lookup) is not the cause (unless Kylin itself has some inconsistency issues 
and modifies/doesn't persist the whole the snapshot).

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new 

[jira] [Commented] (KYLIN-1642) UI Refresh needed after Purging the Cube before Build ...

2016-07-08 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367843#comment-15367843
 ] 

Richard Calaba commented on KYLIN-1642:
---

As the https://issues.apache.org/jira/browse/KYLIN-1647 is closed I beleive 
this is also resolved.

> UI Refresh needed after Purging the Cube before Build ... 
> --
>
> Key: KYLIN-1642
> URL: https://issues.apache.org/jira/browse/KYLIN-1642
> Project: Kylin
>  Issue Type: Bug
>  Components: General, Web 
>Affects Versions: all, v1.5.1
>Reporter: Richard Calaba
>Priority: Trivial
>
> Hello, minor bug on Web UI of Kylin discovered:
> - After calling Purge on Cube and then trying immediately Build the cube 
> again (and selecting a Today as the time dimension selection for new segment) 
> I was getting an error that the selected date (of the new segment) should be 
> bigger than the date of last loaded segment - which was already Purged ...
> - I had to refresh the Web UI in order to be able to schedule the Build of 
> empty cube 
> Seems the Purge need to refresh the WebUI metadata about the loaded segments 
> ...
> Observed on version 1.5.1 but assuming all affected ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1830) Put KYLIN_JVM_SETTINGS to kylin.properties

2016-07-07 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367146#comment-15367146
 ] 

Richard Calaba commented on KYLIN-1830:
---

Fair enough - I had no idea so I did mine property parsing to support commented 
parameters , paramteres in quotes , spliting only after 1st '=' sign, atd ... 
if all this is supported by the bin/get-properties.sh then fine -> it will be 
even faster as mine code scans the property every time the 
export_property_override is used instead of export parameter. 

It was designed this way to make the old setenv.sh and the new one very 
similiar (export_property_override   instead of export 
=.

Also mine implementation supports specifying default value in the script so it 
doesn't have to be overriden in kylin.properties file.

> Put KYLIN_JVM_SETTINGS to kylin.properties
> --
>
> Key: KYLIN-1830
> URL: https://issues.apache.org/jira/browse/KYLIN-1830
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Richard Calaba
>Priority: Minor
>  Labels: newbie
> Attachments: kylin.properties, setenv.sh
>
>
> Currently is the KYLIN_JVM_SETTINGS variable stored in the ,/bin/setenv.sh 
> ... which is not wrong, but as we have also some other memory specific 
> setting in ./conf/kylin.properties file (like i.e 
> kylin.job.mapreduce.default.reduce.input.mb or kylin.table.snapshot.max_mb) 
> it might be good idea to have those performance and sizing related parameters 
> in one location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1857) Show available memory on UI - in System Tab (and other runtime statistics)

2016-07-07 Thread Richard Calaba (JIRA)
Richard Calaba created KYLIN-1857:
-

 Summary: Show available memory on UI - in System Tab (and other 
runtime statistics)
 Key: KYLIN-1857
 URL: https://issues.apache.org/jira/browse/KYLIN-1857
 Project: Kylin
  Issue Type: Improvement
Affects Versions: v1.5.2, v1.5.2.1
Reporter: Richard Calaba
Priority: Minor


I have run into situation that Kylin dies (exception in log says heap out of 
memory) if I try to run 3 parallel cubes with high-cardinality dimensions. It 
is reproduceable scenario. I have set max snapshot size to 2GB and -Xmx to 
16GB. 

If I run the cube build one-by-one -> Kylin doesn't die. 

As we have have no idea about memory requirements before we start building the 
cube(s) then for now it would be beneficial at least to monitor basic Kylin VM 
statistics, i.e.:

 -- current memory occupied by snaphots
 -- total memory allocation & total free memory 
 -- how many (and which) temporary (intermediate) objects (in 
hive/hbase/filesystem) are created ... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1856) Kylin shows old error in job step output after resume - specifically in #4 Step Name: Build Dimension Dictionary

2016-07-07 Thread Richard Calaba (JIRA)
Richard Calaba created KYLIN-1856:
-

 Summary: Kylin shows old error in job step output after resume - 
specifically in #4 Step Name: Build Dimension Dictionary
 Key: KYLIN-1856
 URL: https://issues.apache.org/jira/browse/KYLIN-1856
 Project: Kylin
  Issue Type: Bug
Affects Versions: v1.5.2, v1.5.2.1
Reporter: Richard Calaba
Priority: Minor


I have realized that if my job stops with error and I try to recover the error 
and resume the job - then the latest step starts again from scratch. This is 
fine by in my opinion the log of the Step should clear as well - now it is 
showing the error from my previous attempt.

Specifically observed in #4 Step Name: Build Dimension Dictionary - but is 
probbaly generic issue.

Ask is: clear the log of the Build Step if job Step is resumed. Already when 
the job step is restarted, not after it is completed.

(if Kylin fails i.e. for out of memmory - it silently dies and analyzing the 
step log shows wrong error (from previous run) - if it would be empty -> I 
would know that most probable cause was that Kylin died)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1856) Kylin shows old error in job step output after resume - specifically in #4 Step Name: Build Dimension Dictionary

2016-07-07 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1856:
--
Description: 
I have realized that if my job stops with error and I try to recover the error 
and resume the job - then the latest step starts again from scratch. This is 
fine but in my opinion the log of the Step should clear as well - now it is 
showing the error from my previous attempt.

Specifically observed in #4 Step Name: Build Dimension Dictionary - but is 
probbaly generic issue.

To correct this: clear the log of the Build Step after the job Step is resumed. 
Already when the job step is restarted, not after it is completed.

(if Kylin fails i.e. for out of memory - it silently dies and analyzing the 
step log shows wrong error (from previous run) - if it would be empty -> I 
would know that most probable cause was that Kylin died)

  was:
I have realized that if my job stops with error and I try to recover the error 
and resume the job - then the latest step starts again from scratch. This is 
fine but in my opinion the log of the Step should clear as well - now it is 
showing the error from my previous attempt.

Specifically observed in #4 Step Name: Build Dimension Dictionary - but is 
probbaly generic issue.

To correct this: clear the log of the Build Step after the job Step is resumed. 
Already when the job step is restarted, not after it is completed.

(if Kylin fails i.e. for out of memmory - it silently dies and analyzing the 
step log shows wrong error (from previous run) - if it would be empty -> I 
would know that most probable cause was that Kylin died)


> Kylin shows old error in job step output after resume - specifically in #4 
> Step Name: Build Dimension Dictionary
> 
>
> Key: KYLIN-1856
> URL: https://issues.apache.org/jira/browse/KYLIN-1856
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Minor
>
> I have realized that if my job stops with error and I try to recover the 
> error and resume the job - then the latest step starts again from scratch. 
> This is fine but in my opinion the log of the Step should clear as well - now 
> it is showing the error from my previous attempt.
> Specifically observed in #4 Step Name: Build Dimension Dictionary - but is 
> probbaly generic issue.
> To correct this: clear the log of the Build Step after the job Step is 
> resumed. Already when the job step is restarted, not after it is completed.
> (if Kylin fails i.e. for out of memory - it silently dies and analyzing the 
> step log shows wrong error (from previous run) - if it would be empty -> I 
> would know that most probable cause was that Kylin died)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1856) Kylin shows old error in job step output after resume - specifically in #4 Step Name: Build Dimension Dictionary

2016-07-07 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1856:
--
Description: 
I have realized that if my job stops with error and I try to recover the error 
and resume the job - then the latest step starts again from scratch. This is 
fine but in my opinion the log of the Step should clear as well - now it is 
showing the error from my previous attempt.

Specifically observed in #4 Step Name: Build Dimension Dictionary - but is 
probbaly generic issue.

Ask is: clear the log of the Build Step if job Step is resumed. Already when 
the job step is restarted, not after it is completed.

(if Kylin fails i.e. for out of memmory - it silently dies and analyzing the 
step log shows wrong error (from previous run) - if it would be empty -> I 
would know that most probable cause was that Kylin died)

  was:
I have realized that if my job stops with error and I try to recover the error 
and resume the job - then the latest step starts again from scratch. This is 
fine by in my opinion the log of the Step should clear as well - now it is 
showing the error from my previous attempt.

Specifically observed in #4 Step Name: Build Dimension Dictionary - but is 
probbaly generic issue.

Ask is: clear the log of the Build Step if job Step is resumed. Already when 
the job step is restarted, not after it is completed.

(if Kylin fails i.e. for out of memmory - it silently dies and analyzing the 
step log shows wrong error (from previous run) - if it would be empty -> I 
would know that most probable cause was that Kylin died)


> Kylin shows old error in job step output after resume - specifically in #4 
> Step Name: Build Dimension Dictionary
> 
>
> Key: KYLIN-1856
> URL: https://issues.apache.org/jira/browse/KYLIN-1856
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Minor
>
> I have realized that if my job stops with error and I try to recover the 
> error and resume the job - then the latest step starts again from scratch. 
> This is fine but in my opinion the log of the Step should clear as well - now 
> it is showing the error from my previous attempt.
> Specifically observed in #4 Step Name: Build Dimension Dictionary - but is 
> probbaly generic issue.
> Ask is: clear the log of the Build Step if job Step is resumed. Already when 
> the job step is restarted, not after it is completed.
> (if Kylin fails i.e. for out of memmory - it silently dies and analyzing the 
> step log shows wrong error (from previous run) - if it would be empty -> I 
> would know that most probable cause was that Kylin died)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-06 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365366#comment-15365366
 ] 

Richard Calaba edited comment on KYLIN-1834 at 7/7/16 12:14 AM:


Adding further logger statements I found out that the TrieEncoding fails in the 
method  lookupSeqNoFromValue in this laste else statement "else { // children 
are ordered by their first value byte" in the while loop:

while (true) {

p = c + firstByteOffset;

comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte);

if (comp == 0) { // continue in the matching child, reset n and 
loop again

n = c;

o++;

break;

} else if (comp < 0) { // try next child

seq += BytesUtil.readUnsigned(trieBytes, c + 
sizeChildOffset, sizeNoValuesBeneath);

if (checkFlag(c, BIT_IS_LAST_CHILD))

return roundSeqNo(roundingFlag, seq - 1, -1, seq); // 
no child can match the next byte of input

c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1);

} else { // children are ordered by their first value byte

 THIS CODE IS CAUSING RETURN -1

return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no 
child can match the next byte of input

}

}


private int roundSeqNo(int roundingFlag, int i, int j, int k) {

if (roundingFlag == 0)

return j;

else if (roundingFlag < 0)

return i;

else

return k;

}

The roundingFlag is set to 0 ; we are using Int encoding with length = 8 for 
the affected Dimension  the dimension ID being reported in the Value not 
found is of type Bigint.

That's pretty much all I can figure out ... so now question to the guru who 
wrote the TrieDictionary encoding logic ... why is this code failing here ???

BTW: Sorry for the code formatting - JIRA really sucks in this


was (Author: cal...@gmail.com):
Adding further logger statements I found out that the TrieEncoding fails in the 
method  lookupSeqNoFromValue in this laste else statement "else { // children 
are ordered by their first value byte" in the while loop:

while (true) {

p = c + firstByteOffset;

comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte);

if (comp == 0) { // continue in the matching child, reset n and 
loop again

n = c;

o++;

break;

} else if (comp < 0) { // try next child

seq += BytesUtil.readUnsigned(trieBytes, c + 
sizeChildOffset, sizeNoValuesBeneath);

if (checkFlag(c, BIT_IS_LAST_CHILD))

return roundSeqNo(roundingFlag, seq - 1, -1, seq); // 
no child can match the next byte of input

c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1);

} else { // children are ordered by their first value byte

 THIS CODE IS CAUSING RETURN -1

return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no 
child can match the next byte of input

}

}


private int roundSeqNo(int roundingFlag, int i, int j, int k) {

if (roundingFlag == 0)

return j;

else if (roundingFlag < 0)

return i;

else

return k;

}

The roundingFlag is set to 0 ; we are using Int encoding with length = 8 for 
the affected Dimension  the dimension ID being reported in the Value not 
found is of type Bigint.

That's pretty much all I can figure out ... so now question to the guru whe 
wrote the TrieDictionary encoding logic ... why is this code failing here ???

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 

[jira] [Comment Edited] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-06 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365366#comment-15365366
 ] 

Richard Calaba edited comment on KYLIN-1834 at 7/7/16 12:12 AM:


Adding further logger statements I found out that the TrieEncoding fails in the 
method  lookupSeqNoFromValue in this laste else statement "else { // children 
are ordered by their first value byte" in the while loop:

while (true) {

p = c + firstByteOffset;

comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte);

if (comp == 0) { // continue in the matching child, reset n and 
loop again

n = c;

o++;

break;

} else if (comp < 0) { // try next child

seq += BytesUtil.readUnsigned(trieBytes, c + 
sizeChildOffset, sizeNoValuesBeneath);

if (checkFlag(c, BIT_IS_LAST_CHILD))

return roundSeqNo(roundingFlag, seq - 1, -1, seq); // 
no child can match the next byte of input

c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1);

} else { // children are ordered by their first value byte

 THIS CODE IS CAUSING RETURN -1

return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no 
child can match the next byte of input

}

}


private int roundSeqNo(int roundingFlag, int i, int j, int k) {

if (roundingFlag == 0)

return j;

else if (roundingFlag < 0)

return i;

else

return k;

}

The roundingFlag is set to 0 ; we are using Int encoding with length = 8 for 
the affected Dimension  the dimension ID being reported in the Value not 
found is of type Bigint.

That's pretty much all I can figure out ... so now question to the guru whe 
wrote the TrieDictionary encoding logic ... why is this code failing here ???


was (Author: cal...@gmail.com):
Adding further logger statements I found out that the TrieEncoding fails in the 
method  lookupSeqNoFromValue in this laste else statement "else { // children 
are ordered by their first value byte" in the while loop:

while (true) {
p = c + firstByteOffset;
comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte);
if (comp == 0) { // continue in the matching child, reset n and 
loop again
n = c;
o++;
break;
} else if (comp < 0) { // try next child
seq += BytesUtil.readUnsigned(trieBytes, c + 
sizeChildOffset, sizeNoValuesBeneath);
if (checkFlag(c, BIT_IS_LAST_CHILD))
return roundSeqNo(roundingFlag, seq - 1, -1, seq); // 
no child can match the next byte of input
c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1);
} else { // children are ordered by their first value byte
 THIS CODE IS CAUSING RETURN -1
return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no 
child can match the next byte of input
}
}

private int roundSeqNo(int roundingFlag, int i, int j, int k) {
if (roundingFlag == 0)
return j;
else if (roundingFlag < 0)
return i;
else
return k;
}

The roundingFlag is set to 0 ; we are using Int encoding with length = 8 for 
the affected Dimension  the dimension ID being reported in the Value not 
found is of type Bigint.

That's pretty much all I can figure out ... so now question to the guru whe 
wrote the TrieDictionary encoding logic ... why is this code failing here ???

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   

[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-06 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365366#comment-15365366
 ] 

Richard Calaba commented on KYLIN-1834:
---

Adding further logger statements I found out that the TrieEncoding fails in the 
method  lookupSeqNoFromValue in this laste else statement "else { // children 
are ordered by their first value byte" in the while loop:

while (true) {
p = c + firstByteOffset;
comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte);
if (comp == 0) { // continue in the matching child, reset n and 
loop again
n = c;
o++;
break;
} else if (comp < 0) { // try next child
seq += BytesUtil.readUnsigned(trieBytes, c + 
sizeChildOffset, sizeNoValuesBeneath);
if (checkFlag(c, BIT_IS_LAST_CHILD))
return roundSeqNo(roundingFlag, seq - 1, -1, seq); // 
no child can match the next byte of input
c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1);
} else { // children are ordered by their first value byte
 THIS CODE IS CAUSING RETURN -1
return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no 
child can match the next byte of input
}
}

private int roundSeqNo(int roundingFlag, int i, int j, int k) {
if (roundingFlag == 0)
return j;
else if (roundingFlag < 0)
return i;
else
return k;
}

The roundingFlag is set to 0 ; we are using Int encoding with length = 8 for 
the affected Dimension  the dimension ID being reported in the Value not 
found is of type Bigint.

That's pretty much all I can figure out ... so now question to the guru whe 
wrote the TrieDictionary encoding logic ... why is this code failing here ???

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache 

[jira] [Comment Edited] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-06 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365245#comment-15365245
 ] 

Richard Calaba edited comment on KYLIN-1834 at 7/6/16 10:53 PM:


To further debug the issue I have modified TrieDictionary.java to add 
additional log info to method getIdFromValueBytesImpl:

@Override
protected int getIdFromValueBytesImpl(byte[] value, int offset, int len, 
int roundingFlag) {
int seq = lookupSeqNoFromValue(headSize, value, offset, offset + len, 
roundingFlag);
int id = calcIdFromSeqNo(seq);
if (id < 0)
{
logger.error("Not a valid value: " + 
bytesConvert.convertFromBytes(value, offset, len));
logger.error("Seq (="+seq+") returned by 
lookupSeqNoFromValue (headSize="+headSize+", value="+value+", 
offset="+offset+", len="+len+", roundingFlag="+roundingFlag);
logger.error("Id (="+id+") returned by 
calcIdFromSeqNo(seq) with nValues="+nValues+", baseId="+baseId);
}
return id;
}

Now I see this in kylin log:

2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:174 : Not a 
valid value: -2857007631392161431
2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:175 : Seq 
(=-1) returned by lookupSeqNoFromValue (headSize=64, value=[B@12647ae0, 
offset=0, len=20, roundingFlag=0
2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:176 : Id 
(=-1) returned by calcIdFromSeqNo(seq) with nValues=44703717, baseId=0
2016-07-06 16:57:16,917 ERROR [pool-2-thread-7] execution.AbstractExecutable:62 
: error execute 
HadoopShellExecutable{id=21521c0a-c06f-4ee9-b682-2c468bfaf526-03, name=Build 
Dimension Dictionary, state=RUNNING}
java.lang.IllegalArgumentException: Value not exists!
at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160


So definitely the method lookupSeqNoFromValue fails while trying to encode the 
value:

nValues= 44703717 - ??? not sure where this number comes from 
- # of distinct ids (customer_id) in the fact table is 10 873 977
- # of distinct ids (customer_id) in the lookup table is 13 645 863
- # of distinct IDs (transaction_id) - another high cardinality dimension 
withouth lookup table - is 115 732 839 
- # of distinct combinations in fact table of date / customer_id (2nd 
lookup table in the model using the high card. dimension) is 31 663 787

So no idea where the nValues= 44703717  comes from ... 

Method lookupSeqNoFromValue source:


private int lookupSeqNoFromValue(int n, byte[] inp, int o, int inpEnd, int 
roundingFlag) {
if (o == inpEnd) // special 'empty' value
return checkFlag(headSize, BIT_IS_END_OF_VALUE) ? 0 : 
roundSeqNo(roundingFlag, -1, -1, 0);

int seq = 0; // the sequence no under track

while (true) {
// match the current node, note [0] of node's value has been matched
// when this node is selected by its parent
int p = n + firstByteOffset; // start of node's value
int end = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1); // end 
of node's value
for (p++; p < end && o < inpEnd; p++, o++) { // note matching start 
from [1]
if (trieBytes[p] != inp[o]) {
int comp = BytesUtil.compareByteUnsigned(trieBytes[p], 
inp[o]);
if (comp < 0) {
seq += BytesUtil.readUnsigned(trieBytes, n + 
sizeChildOffset, sizeNoValuesBeneath);
}
return roundSeqNo(roundingFlag, seq - 1, -1, seq); // 
mismatch
}
}

// node completely matched, is input all consumed?
boolean isEndOfValue = checkFlag(n, BIT_IS_END_OF_VALUE);
if (o == inpEnd) {
return p == end && isEndOfValue ? seq : 
roundSeqNo(roundingFlag, seq - 1, -1, seq); // input all matched
}
if (isEndOfValue)
seq++;

// find a child to continue
int c = headSize + (BytesUtil.readUnsigned(trieBytes, n, 
sizeChildOffset) & childOffsetMask);
if (c == headSize) // has no children
return roundSeqNo(roundingFlag, seq - 1, -1, seq); // input 
only partially matched
byte inpByte = inp[o];
int comp;
while (true) {
p = c + firstByteOffset;
comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte);
if (comp == 0) { // continue in the matching child, reset n and 
loop again
n = c;
o++;
break;
} else if (comp < 0) { // try next child
seq += 

[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-06 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365245#comment-15365245
 ] 

Richard Calaba commented on KYLIN-1834:
---

To further debug the issue I have modified TrieDictionary.java to add 
additional log info to method getIdFromValueBytesImpl:

@Override
protected int getIdFromValueBytesImpl(byte[] value, int offset, int len, 
int roundingFlag) {
int seq = lookupSeqNoFromValue(headSize, value, offset, offset + len, 
roundingFlag);
int id = calcIdFromSeqNo(seq);
if (id < 0)
{
logger.error("Not a valid value: " + 
bytesConvert.convertFromBytes(value, offset, len));
logger.error("Seq (="+seq+") returned by 
lookupSeqNoFromValue (headSize="+headSize+", value="+value+", 
offset="+offset+", len="+len+", roundingFlag="+roundingFlag);
logger.error("Id (="+id+") returned by 
calcIdFromSeqNo(seq) with nValues="+nValues+", baseId="+baseId);
}
return id;
}

Now I see this in kylin log:

2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:174 : Not a 
valid value: -2857007631392161431
2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:175 : Seq 
(=-1) returned by lookupSeqNoFromValue (headSize=64, value=[B@12647ae0, 
offset=0, len=20, roundingFlag=0
2016-07-06 16:57:16,912 ERROR [pool-2-thread-7] dict.TrieDictionary:176 : Id 
(=-1) returned by calcIdFromSeqNo(seq) with nValues=44703717, baseId=0
2016-07-06 16:57:16,917 ERROR [pool-2-thread-7] execution.AbstractExecutable:62 
: error execute 
HadoopShellExecutable{id=21521c0a-c06f-4ee9-b682-2c468bfaf526-03, name=Build 
Dimension Dictionary, state=RUNNING}
java.lang.IllegalArgumentException: Value not exists!
at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160


So definitely the method lookupSeqNoFromValue fails while trying to encode the 
value:

nValues= 44703717 - not sure where this number comes from - # of distinct ids 
in the dimenson is approx. 13 mio

Method lookupSeqNoFromValue source:


private int lookupSeqNoFromValue(int n, byte[] inp, int o, int inpEnd, int 
roundingFlag) {
if (o == inpEnd) // special 'empty' value
return checkFlag(headSize, BIT_IS_END_OF_VALUE) ? 0 : 
roundSeqNo(roundingFlag, -1, -1, 0);

int seq = 0; // the sequence no under track

while (true) {
// match the current node, note [0] of node's value has been matched
// when this node is selected by its parent
int p = n + firstByteOffset; // start of node's value
int end = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1); // end 
of node's value
for (p++; p < end && o < inpEnd; p++, o++) { // note matching start 
from [1]
if (trieBytes[p] != inp[o]) {
int comp = BytesUtil.compareByteUnsigned(trieBytes[p], 
inp[o]);
if (comp < 0) {
seq += BytesUtil.readUnsigned(trieBytes, n + 
sizeChildOffset, sizeNoValuesBeneath);
}
return roundSeqNo(roundingFlag, seq - 1, -1, seq); // 
mismatch
}
}

// node completely matched, is input all consumed?
boolean isEndOfValue = checkFlag(n, BIT_IS_END_OF_VALUE);
if (o == inpEnd) {
return p == end && isEndOfValue ? seq : 
roundSeqNo(roundingFlag, seq - 1, -1, seq); // input all matched
}
if (isEndOfValue)
seq++;

// find a child to continue
int c = headSize + (BytesUtil.readUnsigned(trieBytes, n, 
sizeChildOffset) & childOffsetMask);
if (c == headSize) // has no children
return roundSeqNo(roundingFlag, seq - 1, -1, seq); // input 
only partially matched
byte inpByte = inp[o];
int comp;
while (true) {
p = c + firstByteOffset;
comp = BytesUtil.compareByteUnsigned(trieBytes[p], inpByte);
if (comp == 0) { // continue in the matching child, reset n and 
loop again
n = c;
o++;
break;
} else if (comp < 0) { // try next child
seq += BytesUtil.readUnsigned(trieBytes, c + 
sizeChildOffset, sizeNoValuesBeneath);
if (checkFlag(c, BIT_IS_LAST_CHILD))
return roundSeqNo(roundingFlag, seq - 1, -1, seq); // 
no child can match the next byte of input
c = p + BytesUtil.readUnsigned(trieBytes, p - 1, 1);
} else { // children are ordered by their first value byte
return roundSeqNo(roundingFlag, seq - 1, -1, seq); // no 
child 

[jira] [Comment Edited] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-06 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363777#comment-15363777
 ] 

Richard Calaba edited comment on KYLIN-1834 at 7/6/16 7:26 AM:
---

Addition: so even dictionary encoding type Int, length 25 - same Value not 
found exception reached.
The dimension has 10 mio. distinct IDs. 

I didn't find any way in Kylin 1.5.2.1 to process such dimension. Seems Kylin 
doesn't support high cardinality in current design.

Analysis of the code - the error is raised in TrieDictionary.java - line 172 
[if (id < 0)] is not fulfilled ... cause most probably in method 
lookupSeqNoFromValue 

@Override
protected int getIdFromValueBytesImpl(byte[] value, int offset, int len, 
int roundingFlag) {
int seq = lookupSeqNoFromValue(headSize, value, offset, offset + len, 
roundingFlag);
int id = calcIdFromSeqNo(seq);
if (id < 0)
logger.error("Not a valid value: " + 
bytesConvert.convertFromBytes(value, offset, len));
return id;
}


was (Author: cal...@gmail.com):
Addition: so even dictionary encoding type Int, length 25 - same Value not 
found exception reached.
The dimension has 10 mio. distinct IDs. 

I didn't find any way in Kylin 1.5.2.1 to process such dimension. Seems Kylin 
doesn't support high cardinality in current design.

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> 

[jira] [Comment Edited] (KYLIN-1827) Send mail notification when runtime exception throws during build/merge cube

2016-07-05 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363811#comment-15363811
 ] 

Richard Calaba edited comment on KYLIN-1827 at 7/6/16 5:40 AM:
---

Hello, I wonder where is configuration of the SMTP server to be used while 
sending the mail notifications ... ??? I didin't find any ... and also didin't 
find andy document describing how to configure Notifications in Kylin ...


Now searching little more - I found JIRA 
https://issues.apache.org/jira/browse/KYLIN-672 and there is:
kylin.properties
If true, will send email notification;
mail.enabled=false
mail.host=
mail.username=
mail.password=
mail.sender=

So looks like this would be the settings  ... but the branch is for 2.x so not 
sure if the notifications configuration is working in 1.5.x ... anyone can 
confirm ?


was (Author: cal...@gmail.com):
Hello, I wonder where is configuration of the SMTP server to be used while 
sending the mail notifications ... ??? I didin't find any ... and also didin't 
find andy document describing how to configure Notifications in Kylin ...

> Send mail notification when runtime exception throws during build/merge cube
> 
>
> Key: KYLIN-1827
> URL: https://issues.apache.org/jira/browse/KYLIN-1827
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.1, v1.5.2
>Reporter: Ma Gang
>Assignee: Ma Gang
>
> Currently mail notification is only sent in the onExecuteFinished() method, 
> but no notification will be sent when RuntimeException throws, that may cause 
> user miss some important job build failure, especially for some automation 
> merge jobs.  Sometimes job state update fails(the hbase metastore is 
> unavailable in a short time), it will make the job always look like in a 
> running state, but actually it is failed, should send mail to notify user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1827) Send mail notification when runtime exception throws during build/merge cube

2016-07-05 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363811#comment-15363811
 ] 

Richard Calaba commented on KYLIN-1827:
---

Hello, I wonder where is configuration of the SMTP server to be used while 
sending the mail notifications ... ??? I didin't find any ... and also didin't 
find andy document describing how to configure Notifications in Kylin ...

> Send mail notification when runtime exception throws during build/merge cube
> 
>
> Key: KYLIN-1827
> URL: https://issues.apache.org/jira/browse/KYLIN-1827
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v1.5.1, v1.5.2
>Reporter: Ma Gang
>Assignee: Ma Gang
>
> Currently mail notification is only sent in the onExecuteFinished() method, 
> but no notification will be sent when RuntimeException throws, that may cause 
> user miss some important job build failure, especially for some automation 
> merge jobs.  Sometimes job state update fails(the hbase metastore is 
> unavailable in a short time), it will make the job always look like in a 
> running state, but actually it is failed, should send mail to notify user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1794) Enable job list even some job metadata parsing failed

2016-07-05 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363803#comment-15363803
 ] 

Richard Calaba edited comment on KYLIN-1794 at 7/6/16 5:32 AM:
---

Hello, I have observed the same problem. Do not have stack trace but this is 
what had happened:

- I had some finished and some ERR jobs in my 1.5.2.1 Kylin - I compiled and 
installed latest snapshot of Kylin-1.5.3-SNAPSHOT over the 1.5.2.1 ...

After that - I was getting an error on UI while visiting the "Monitor" ... and 
the Monitor section was empty (even previously there were jobs listed)

The error didn't go away even when I downgraded back to 1.5.2.1 ... so assuming 
some incompatibility in metadata between 1.5.2.1 and 1.5.3-SNAPSHOT ... I was 
having simiiar issues when testing earlier releases and switcheing between 
versions ...

The only recovery option is to do full cleanup of metadata repository.

Hope this helps to find this bug. 

I believe that the original reporter is simply asking for solution to catch 
exception while loading monitoring UI and if any thrown -> log it and contine 
with loading next jobs ... 


was (Author: cal...@gmail.com):
Hello, I have observed the same problem. Do not have stack trace but what had 
hapened:

- I had some fiunsihed and some ERR jobs in my 1.5.2.1 Kylin - over this I 
compiled and installed latest snapshot of Kylin-1.5.3-SNAPSHOT ...

After that - I was getting an error on UI while visiting the "Monitor" ... the 
error didn't go away even when I downgraded back to 1.5.2.1 ... so assuming 
some incompatibility in metadata between 1.5.2.1 and 1.5.3-SNAPSHOT ... I was 
having simiiar issues when testing earlier releases and switcheing between 
versions ...

The only recovery option is to do full cleanup of metadata repository.

Hope this helps to find this bug. 

I believe that the original reporter is simply asking for solution to catch 
exception while loading monitoring UI and if any thrown -> log it and contine 
with loading next jobs ... 

> Enable job list even some job metadata parsing failed
> -
>
> Key: KYLIN-1794
> URL: https://issues.apache.org/jira/browse/KYLIN-1794
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Roger Shi
>Assignee: Shaofeng SHI
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1794) Enable job list even some job metadata parsing failed

2016-07-05 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363803#comment-15363803
 ] 

Richard Calaba commented on KYLIN-1794:
---

Hello, I have observed the same problem. Do not have stack trace but what had 
hapened:

- I had some fiunsihed and some ERR jobs in my 1.5.2.1 Kylin - over this I 
compiled and installed latest snapshot of Kylin-1.5.3-SNAPSHOT ...

After that - I was getting an error on UI while visiting the "Monitor" ... the 
error didn't go away even when I downgraded back to 1.5.2.1 ... so assuming 
some incompatibility in metadata between 1.5.2.1 and 1.5.3-SNAPSHOT ... I was 
having simiiar issues when testing earlier releases and switcheing between 
versions ...

The only recovery option is to do full cleanup of metadata repository.

Hope this helps to find this bug. 

I believe that the original reporter is simply asking for solution to catch 
exception while loading monitoring UI and if any thrown -> log it and contine 
with loading next jobs ... 

> Enable job list even some job metadata parsing failed
> -
>
> Key: KYLIN-1794
> URL: https://issues.apache.org/jira/browse/KYLIN-1794
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Roger Shi
>Assignee: Shaofeng SHI
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-05 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363777#comment-15363777
 ] 

Richard Calaba commented on KYLIN-1834:
---

Addition: so even dictionary encoding type Int, length 25 - same Value not 
found exception reached.
The dimension has 10 mio. distinct IDs. 

I didn't find any way in Kylin 1.5.2.1 to process such dimension. Seems Kylin 
doesn't support high cardinality in current design.

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected 
> dimension/row key Encoding from 

[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-07-05 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363767#comment-15363767
 ] 

Richard Calaba commented on KYLIN-1834:
---

I did try everything possible. The only thing I didn't try was to switch off 
dictionary encoding completely for this dimension ... is it possible ??? How?

I tried:
  - fixed length encoding with length 20
 - I tried Int encoding with max. size 8 - even with size 10 (even the UI 
complains that 8 is max)
 Always got same exeception -> Value not found!

Now trying final test - Int with lentgh 25 - not sure what the lentgth relates 
to - whether # of bytes or # of decimals ... - so trying length 25 

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were 

[jira] [Commented] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)

2016-07-05 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363743#comment-15363743
 ] 

Richard Calaba commented on KYLIN-1835:
---

What if I use Bigint ID but I know that I have less than Integer.MAX_VALUE 
distinct values in my dimension ??? Any chance the code can be adjusted to 
support this ??

> Error: java.lang.NumberFormatException: For input count_distinct on Big Int 
> ??? (#7 Step Name: Build Base Cuboid Data)
> --
>
> Key: KYLIN-1835
> URL: https://issues.apache.org/jira/browse/KYLIN-1835
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Minor
>
> I believe I have discovered an error in Kylin realted to count_distinc with 
> exact precission.
> I am not 100% sure - but all points to the fact tha there is a design limit 
> for count_distinct ... please assess / confirm / reject my observation.
> Background info:
> =
> - large fact table ~ 100 mio rows.
> - large customer dimension ~ 10 mio rows
> Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type 
> bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for 
> one measure max 15 000 000 distinct values ; 2nd measure can have more 
> distinct values ~ approx. 50 mil (just an estimate). 
> Error info:
> 
> Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it 
> errors out without further details in Kylin Log - it shows only "no counters 
> for job job_1463699962519_16085".
> The MR Logs of the job job_1463699962519_16085 sow exceptions:
> 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.NumberFormatException: For input string: 
> "-6628245177096591402"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:495)
>   at java.lang.Integer.parseInt(Integer.java:527)
>   at 
> org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206)
>   at 
> org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Just reading the signature of the exception and connecting the Measure 
> precision return type "bitmap" => looks like that because I have chosen exact 
> precision (which on UI says supported for int types) is causing this 
> exception because I am passing Bigint field  
> If so -> is that a bug (refactory for big int needed) or is it design 
> limitation ??? Cannot be the count_distinct implemented for bigint (with 
> exact precision) or do I have to use count_distinct with error rate instead 
> ???
> In case I do not need to calculate the count_distinct for all dimensions 
> combinations -  I might add some mandatory dimensions to the aggregation 
> group - but not sure if this would resolve this issue (assuming I keep the 
> exact precision counts) ... ???



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1844) High cardinality dimensions in memory

2016-07-05 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363738#comment-15363738
 ] 

Richard Calaba commented on KYLIN-1844:
---

How can we switch-off dictionary for high cardinality dimensions ???

> High cardinality dimensions in memory
> -
>
> Key: KYLIN-1844
> URL: https://issues.apache.org/jira/browse/KYLIN-1844
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v1.2, v1.5.2
>Reporter: Abhilash L L
>Assignee: liyang
>
> A whole dimension is kept in memory.
> We should have a way to keep only certain number / size of total rows to be 
> kept in memory. A LRU cache for rows in the dimension will help keep memory 
> in check.
> Why not store all the dimensions data in hbase in a different table with a 
> prefix of dimensionid, and all calls to the dimensions (get based on dim 
> key), is mapped to hbase.
> This does mean it will cost more time on a miss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1775) Add Cube Migrate Support for Global Dictionary

2016-07-05 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363728#comment-15363728
 ] 

Richard Calaba commented on KYLIN-1775:
---

What is the attached patch applicable to  ??? I tried against kylin-1.5.2.1 
(latest release) and this didn't succeed ... or is it agains current master 
branch ?

> Add Cube Migrate Support for Global Dictionary
> --
>
> Key: KYLIN-1775
> URL: https://issues.apache.org/jira/browse/KYLIN-1775
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.5.3
>Reporter: Yerui Sun
>Assignee: Yerui Sun
> Fix For: v1.5.3
>
> Attachments: KYLIN-1775.patch
>
>
> Since KYLIN-1705, we've introduced global dictionary. The global dictionary 
> will serialize dict data into hdfs storage directly, instead of save in hbase 
> resource store. However, when cube was migrated from one metadata to another, 
> the global dict data didn't copy to the new metadata.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1847) Cleanup of Intermediate tables not working well

2016-07-04 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1847:
--
Description: 
I have realized that Hive tables kylin_intermediate__ 
after cancelling all pending build jobs and dropping the cube are not cleaned 
properly. 

It could be that I didn't execute Purge before Dropping the cube ... just a 
theory, not 100% sure.

I also suspect that on hdfs in the /kylin/kylin_metadata/ directory I have too 
many uncleaned data ... considering that I have just now only 1 cube having a 
pending build job I see too many subdirectories there ...

There might be some relation to he JIRA I already reported as well ... 
https://issues.apache.org/jira/browse/KYLIN-1828 - but again not 100% sure this 
is the sole reason.

My impression is that the cleanup logic after the Drop cube needs to be 
re-checked.

  was:
I have realized that Hive tables kylin_intermediate__ 
after cancelling all pending build jobs and dropping the cube are no cleaned 
properly. 

It could be that I didn't execute Purge befor Dropping the cube ... just a 
theory, not 100% sure.

I also suspect that on hdfs in the /kylin/kylin_metadata/ directory I have too 
many uncleaned data ... considering that I have just now only 1 cube wih 
pending build job I see too many subdirectories there ...

There might be some relation to he JIRA I already reported as well ... 
https://issues.apache.org/jira/browse/KYLIN-1828 - but again not 100% sure this 
is the sole reason.

My impression is that the cleanup logic after the Drop cube needs to be 
re-checked.


> Cleanup of Intermediate tables not working well
> ---
>
> Key: KYLIN-1847
> URL: https://issues.apache.org/jira/browse/KYLIN-1847
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>
> I have realized that Hive tables 
> kylin_intermediate__ after cancelling all pending 
> build jobs and dropping the cube are not cleaned properly. 
> It could be that I didn't execute Purge before Dropping the cube ... just a 
> theory, not 100% sure.
> I also suspect that on hdfs in the /kylin/kylin_metadata/ directory I have 
> too many uncleaned data ... considering that I have just now only 1 cube 
> having a pending build job I see too many subdirectories there ...
> There might be some relation to he JIRA I already reported as well ... 
> https://issues.apache.org/jira/browse/KYLIN-1828 - but again not 100% sure 
> this is the sole reason.
> My impression is that the cleanup logic after the Drop cube needs to be 
> re-checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1847) Cleanup of Intermediate tables not working well

2016-07-04 Thread Richard Calaba (JIRA)
Richard Calaba created KYLIN-1847:
-

 Summary: Cleanup of Intermediate tables not working well
 Key: KYLIN-1847
 URL: https://issues.apache.org/jira/browse/KYLIN-1847
 Project: Kylin
  Issue Type: Bug
Affects Versions: v1.5.2, v1.5.2.1
Reporter: Richard Calaba


I have realized that Hive tables kylin_intermediate__ 
after cancelling all pending build jobs and dropping the cube are no cleaned 
properly. 

It could be that I didn't execute Purge befor Dropping the cube ... just a 
theory, not 100% sure.

I also suspect that on hdfs in the /kylin/kylin_metadata/ directory I have too 
many uncleaned data ... considering that I have just now only 1 cube wih 
pending build job I see too many subdirectories there ...

There might be some relation to he JIRA I already reported as well ... 
https://issues.apache.org/jira/browse/KYLIN-1828 - but again not 100% sure this 
is the sole reason.

My impression is that the cleanup logic after the Drop cube needs to be 
re-checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1388) Different realization under one model could share some cubing steps

2016-07-02 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360395#comment-15360395
 ] 

Richard Calaba commented on KYLIN-1388:
---

Also dimension statistics (dictionaries) if one lookup table used in 2 
different cubes (different models) with some additional properties (dictionary 
encoding, ...) might be shareable across cubes.

> Different realization under one model could share some cubing steps
> ---
>
> Key: KYLIN-1388
> URL: https://issues.apache.org/jira/browse/KYLIN-1388
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
>
> The data model behind each realizations(cubes) has shared resources, most 
> significantly being the flattened hive table and the dictionaries. The 
> realizations can check if other realization (with the same model) has already 
> created shared resources. If yes, it can directly skip these steps to save 
> time/resource



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-1837) Feature request - cross cube reuse of Kylin fact/lookup snapshots ...

2016-07-02 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba closed KYLIN-1837.
-
Resolution: Duplicate

> Feature request - cross cube reuse of Kylin fact/lookup snapshots ...
> -
>
> Key: KYLIN-1837
> URL: https://issues.apache.org/jira/browse/KYLIN-1837
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: all
>Reporter: Richard Calaba
>Assignee: Dong Li
>
> Hello Kylin gurus,
> while debugging some issues with high cardinality dimensions - which 
> obviously requires large data to be processed to emulate the problem thus the 
> Cube Build process takes significant time ... I came to this idea:
> - Cannot be the Snapshot logic - be resued cross cubes ??
> - Let's say I have cube 1 and cube 2 which is clone of cube 1 maybe with 
> removed some dimnesions or even having same dimensions and just having 
> different measures definition ... 
> - Cube 1 build fails somewhere in later steps (snaphost already built) in 
> step 1 I believe 
> - Running build of 2nd cube - which let's say is using exactly same 
> dimensions table and in fact also same fact table - this also requires long 
> run because in the Step 1 the build process is calculating the snaphots ... 
> which are already calculated (and still not discared) by the Build Job of 
> Cube 1 
> Is there any chance to define some snapshots reuse scenarios like that (same 
> model/DB tables referred) ... so the modelling  time can be shortened 
> while playing with the cube design ??? (i.e. testing various optimizations 
> like joint dimensions, etc ...- those should not be impacted by the source 
> data stored in the alread calculated snapshots, right ?
> Obviously that should be an option while scheduling Cube Build to 
> enable/disable reuse of snapshots from other similar cubes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements

2016-07-02 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360044#comment-15360044
 ] 

Richard Calaba edited comment on KYLIN-1836 at 7/2/16 4:55 PM:
---

Ad idea1 - yes - basically similar idea - not specifically looking for target 
size but more at target compute complexity ... I will be following the Idea 1 
at the KYLIN-1743

Ad idea2 - well the aim is this - you design the cube - and you ahve too many 
dimensiuons to calculate all cuboids. So you check your reporting requirements 
and try to come up with less dimensions and define Aggregation Group (mandatory 
/ joint dimensions, hierarchies, etc.) to further optimize the cube build time. 
So you come up with optimized cube, builds fine - most of the queries are 
running ... but few are not - because either you optimized too much or some of 
the reporting requirements were not clear or assumptions about data correct ... 
so the queries which fail with no-realization exception should give you some 
feedback back saying why no realization was found - from "main fact table not 
found in any cube" to "combination of dimensions a, b, c not supported by any 
cube". To overcome the problem with too many queries reported there -> we can 
have debug on/off switch when you enable this for certian period of time (or 
short session) to debug the queries which are not finding any realization ...

Ad idea3 - the rowkey is getting more clear now - thank you. I understand the 
reordering to optimize HBase scans. I can guess some optimizations in case of 
using dictionary / fixed encodings. Little less I understand the "int" encoding 
- especially because it seems not to work for Bigint. 

What I am totaly confused about is this: what is gonna happen if I remove one 
of the dimensions from the rowkey ... will this dimension be still queryable 
??? Will the whole thing work correctly ?? UI allows to delete sections of the 
rowkey ...



was (Author: cal...@gmail.com):
Ad idea1 - yes - basically similar idea - not specifically looking for target 
size but more at target compute complexity ... I will be following the Idea 1 
at the KYLIN-1743

Ad idea2 - well the aim is this - you design the cube - and you ahve too many 
dimensiuons to calculate all cuboids. So you check your reporting requirements 
and try to come up with less dimensions and define Aggregation Group (mandatory 
/ joint dimensions, hierarchies, etc.) to further optimize the cube build time. 
So you come up with optimized cube, builds fine - most of the queries are 
running ... but few are not - because either you optimized too much or some of 
the reporting requirements were not clear or assumptions about data correct ... 
so the queries which fail with no-realization exception should give you some 
feedback back saying why no realization was found - from "main fact table not 
found in any cube" to "combination of dimensions a, b, c not supported by any 
cube". To overcome the problem with too many queries reported there -> we can 
have debug on/off switch when you enable this for certian period of time (or 
short session) to debug the queries which are not finding any realization ...

Ad idea3 - the rowkey is getting more clear now - thank you. I understand the 
reordering to optimize HBase scans. I can guess some optimizations in case of 
using dictionary / fixed encodings. Little less I understand the "int" encoding 
- especially because it seems not to work for Bigint. 

What I am totaly confused with is this: what is gonna happen if I remove one of 
the dimensions from the rowkey ... will this dimension be still queryable ??? 
Will the whole thing work correctly ?? UI allows to delete sections of the 
rowkey ...


> Kylin 1.5+ New Aggregation Group - UI improvements
> --
>
> Key: KYLIN-1836
> URL: https://issues.apache.org/jira/browse/KYLIN-1836
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.0, v1.5.1, v1.5.2, v1.5.3, v1.5.2.1
>Reporter: Richard Calaba
>
> After reading the Tech Blog - 
> https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin 
> Ma - I got few ideas mentioned below - to help the Cube designers understand 
> impact of their cube design on the Build and Query performance - see below:
> BTW: hank you for putting this Blog together !!! and thank you for 
> referencing this blog through Kylin UI - link in the Aggregation Groups 
> section !! - it is very powerful optimization technique.)
> Idea 1
> =
>  It would be great if the Advanced Settings section on UI can calculate the 
> exact number of Cuboids defined by every Aggregation Group (# of combinations 
> ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and 
> then also showing the overall total of Cuboids considering 

[jira] [Comment Edited] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements

2016-07-02 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360044#comment-15360044
 ] 

Richard Calaba edited comment on KYLIN-1836 at 7/2/16 7:31 AM:
---

Ad idea1 - yes - basically similar idea - not specifically looking for target 
size but more at target compute complexity ... I will be following the Idea 1 
at the KYLIN-1743

Ad idea2 - well the aim is this - you design the cube - and you ahve too many 
dimensiuons to calculate all cuboids. So you check your reporting requirements 
and try to come up with less dimensions and define Aggregation Group (mandatory 
/ joint dimensions, hierarchies, etc.) to further optimize the cube build time. 
So you come up with optimized cube, builds fine - most of the queries are 
running ... but few are not - because either you optimized too much or some of 
the reporting requirements were not clear or assumptions about data correct ... 
so the queries which fail with no-realization exception should give you some 
feedback back saying why no realization was found - from "main fact table not 
found in any cube" to "combination of dimensions a, b, c not supported by any 
cube". To overcome the problem with too many queries reported there -> we can 
have debug on/off switch when you enable this for certian period of time (or 
short session) to debug the queries which are not finding any realization ...

Ad idea3 - the rowkey is getting more clear now - thank you. I understand the 
reordering to optimize HBase scans. I can guess some optimizations in case of 
using dictionary / fixed encodings. Little less I understand the "int" encoding 
- especially because it seems not to work for Bigint. 

What I am totaly confused with is this: what is gonna happen if I remove one of 
the dimensions from the rowkey ... will this dimension be still queryable ??? 
Will the whole thing work correctly ?? UI allows to delete sections of the 
rowkey ...



was (Author: cal...@gmail.com):
Ad idea1 - yes - basically similar idea - not specifically looking for target 
size but more at target compute complexity ... I will be following the Idea 1 
at the KYLIN-1743

Ad idea2 - well the aim is this - you design the cube - and you ahve too many 
dimensiuons to calculate all cuboids. So you check your reporting requirements 
and try to come up with less dimensions and define Aggregation Group (mandatory 
/ joint dimensions, hierarchies, etc.) to further optimize the cube build time. 
So you come up with optimized cube, builds fine - most of the queries are 
running ... but few are not - because either you optimized too much or some of 
the reporting requirements were not clear or assumptions about data correct ... 
so the queries which fail with no-realization exception should give you some 
feedback back saying why no realization was found - from "main fact table not 
found in any cube" to "combination of dimensions a, b, c not supported by any 
cube"

Ad idea3 - the rowkey is getting more clear now - thank you. I understand the 
reordering to optimize HBase scans. I can guess some optimizations in case of 
using dictionary / fixed encodings. Little less I understand the "int" encoding 
- especially because it seems not to work for Bigint. 

What I am totaly confused with is this: what is gonna happen if I remove one of 
the dimensions from the rowkey ... will this dimension be still queryable ??? 
Will the whole thing work correctly ?? UI allows to delete sections of the 
rowkey ...


> Kylin 1.5+ New Aggregation Group - UI improvements
> --
>
> Key: KYLIN-1836
> URL: https://issues.apache.org/jira/browse/KYLIN-1836
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.0, v1.5.1, v1.5.2, v1.5.3, v1.5.2.1
>Reporter: Richard Calaba
>
> After reading the Tech Blog - 
> https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin 
> Ma - I got few ideas mentioned below - to help the Cube designers understand 
> impact of their cube design on the Build and Query performance - see below:
> BTW: hank you for putting this Blog together !!! and thank you for 
> referencing this blog through Kylin UI - link in the Aggregation Groups 
> section !! - it is very powerful optimization technique.)
> Idea 1
> =
>  It would be great if the Advanced Settings section on UI can calculate the 
> exact number of Cuboids defined by every Aggregation Group (# of combinations 
> ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and 
> then also showing the overall total of Cuboids considering ALL the defined 
> Aggregation Groups.
> Idea 2
> =
> As Aggregation Group section is about optimizing # of necessary cuboids 
> assuming you know the queries patterns. This is sometimes easy but for more 
> complex dashboards 

[jira] [Commented] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements

2016-07-02 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360044#comment-15360044
 ] 

Richard Calaba commented on KYLIN-1836:
---

Ad idea1 - yes - basically similar idea - not specifically looking for target 
size but more at target compute complexity ... I will be following the Idea 1 
at the KYLIN-1743

Ad idea2 - well the aim is this - you design the cube - and you ahve too many 
dimensiuons to calculate all cuboids. So you check your reporting requirements 
and try to come up with less dimensions and define Aggregation Group (mandatory 
/ joint dimensions, hierarchies, etc.) to further optimize the cube build time. 
So you come up with optimized cube, builds fine - most of the queries are 
running ... but few are not - because either you optimized too much or some of 
the reporting requirements were not clear or assumptions about data correct ... 
so the queries which fail with no-realization exception should give you some 
feedback back saying why no realization was found - from "main fact table not 
found in any cube" to "combination of dimensions a, b, c not supported by any 
cube"

Ad idea3 - the rowkey is getting more clear now - thank you. I understand the 
reordering to optimize HBase scans. I can guess some optimizations in case of 
using dictionary / fixed encodings. Little less I understand the "int" encoding 
- especially because it seems not to work for Bigint. 

What I am totaly confused with is this: what is gonna happen if I remove one of 
the dimensions from the rowkey ... will this dimension be still queryable ??? 
Will the whole thing work correctly ?? UI allows to delete sections of the 
rowkey ...


> Kylin 1.5+ New Aggregation Group - UI improvements
> --
>
> Key: KYLIN-1836
> URL: https://issues.apache.org/jira/browse/KYLIN-1836
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.0, v1.5.1, v1.5.2, v1.5.3, v1.5.2.1
>Reporter: Richard Calaba
>
> After reading the Tech Blog - 
> https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin 
> Ma - I got few ideas mentioned below - to help the Cube designers understand 
> impact of their cube design on the Build and Query performance - see below:
> BTW: hank you for putting this Blog together !!! and thank you for 
> referencing this blog through Kylin UI - link in the Aggregation Groups 
> section !! - it is very powerful optimization technique.)
> Idea 1
> =
>  It would be great if the Advanced Settings section on UI can calculate the 
> exact number of Cuboids defined by every Aggregation Group (# of combinations 
> ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and 
> then also showing the overall total of Cuboids considering ALL the defined 
> Aggregation Groups.
> Idea 2
> =
> As Aggregation Group section is about optimizing # of necessary cuboids 
> assuming you know the queries patterns. This is sometimes easy but for more 
> complex dashboards where multiple people work on defining the queries this is 
> hard to control and guess, thus I would suggest adding a new Tab in the 
> Monitor Kylin UI - next to Job and Slow Queries add additional tab 
> "Non-satisfied Queries" showing the Queries which were not able to be 
> evaluated by Kylin - queries which end with "No Realization" exception. 
> Together with the Query SQL (including all the parameters) it would help to 
> show the "missing dimension name" used in the query which was the cause for 
> not finding proper Cuboid.
> Idea 3
> =
> Can anyone also document the section Rowkeys in the same section of UI 
> (Advanced Settings) ??? It is not really clear what effect will have if I 
> start playing with the Rowkeys section (adding/removing dimension fields; 
> adding non-dimension fields, ...). All I understand is that the "Rowkeys" 
> section has impact only on HBase storage of calculated cuboids. Thus doesn't 
> have impact on Cube Build time that much (except the impact that the Trie for 
> dictionary needs to be built for every specified rowkey on this tab). I 
> understand that the major impact of Rowkeys section is thus only on HBase 
> size / regions split and thus also on the Query execution time. 
> What I am confused with is whether I can define high-cardinality dimension in 
> Cube and remove it from the Rowkeys section ??? What would happen in HBase 
> storage and expected Query time ...would that dimension be still 
> query-enabled ??
> The closest explanation I found is this Reply from - Yu Feng's here 
> http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html
> ==
> Reply: Cube size determines how to split region for table in hbase after 
> generate 
> all cuboid 

[jira] [Comment Edited] (KYLIN-1830) Put KYLIN_JVM_SETTINGS to kylin.properties

2016-07-01 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359948#comment-15359948
 ] 

Richard Calaba edited comment on KYLIN-1830 at 7/2/16 3:18 AM:
---

Okay - fair enough - but I prefer having configuration parameters for sizing in 
one location (/conf ) as we there other performance and execution env. related 
settings.

Thus I have found for you a generic solution you can use for ALL variables in 
setenv.sh - using this solution you can override the parameters from setenv.sh 
by values specified in the conf/kylin.properties file using same name of the 
environment variable.

See attached & updated setenv.sh and kylin.properties files (both from Kylin 
1.5.2.1).

 Background info about implementation:

The trick is that I have defined 2 bash functions which try to read the 
property value override from conf/kylin.properties - and if found - it uses the 
value specified there ; if not found (or the value override is commented by #) 
then it uses the default value as specified in the bin/setenv.sh script.

Feel free to include it to the standard packaging - was thinking to provide 
patch here - but didn't find setenv.sh in source code - probably generated ?

Functions Defined - see setenv.sh attachment.

Example from setenv.sh - how the override of the variable KYLIN_JVM_SETTINGS:

Instead of:

 export KYLIN_JVM_SETTINGS=""

Use:

 export_property_override KYLIN_JVM_SETTINGS ""

I.e:

export_property_override KYLIN_JVM_SETTINGS "-Xms1024M -Xmx4096 
-XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"

Then in the conf/kylin.properties you can put line:

KYLIN_JVM_SETTINGS="-Xms1024M -Xmx16g -XX:MaxPermSize=512M -verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"

Which will override the default value provided in setenv.sh

Tested and working. Enjoy!




was (Author: cal...@gmail.com):
Okay - fair enough - but I prefer having configuration parameters for sizing in 
one location (/conf ) as we there other performance and execution env. related 
settings.

Thus I have found for you a generic solution you can use for ALL variables in 
setenv.sh - using this solution you can override the parameters from setenv.sh 
by values specified in the conf/kylin.properties file using same name of the 
environment variable.

See attached & updated setenv.sh and kylin.properties files (both from Kylin 
1.5.2.1).

 Background info about implementation:

The trick is that I have defined 2 bash functions which try to read the 
property value override from conf/kylin.properties - and if found - it uses the 
value specified there ; if not found (or the value override is commented by #) 
then it uses the default value as specified in the bin/setenv.sh script.

Feel free to include it to the standard packaging - was thinking to provide 
patch here - but didn't find setenv.sh in source code - probably generated ?

Functions Defined:

function parse_properties() {
local param_value=$(awk -F "=" "(!/^(\$|[[:space:]]*#)/) && (/$2/) { 
idx = index(\$0,\"=\"); print substr(\$0,idx+1)}" "$1" | sed -e 's/^"//'  -e 
's/"$//')
if [[ -z "${param_value}" ]]; then
echo $3
else
echo `eval echo "${param_value}"`
fi
}

function export_property_override() {
# default Kylin property file location - for environment values override
local kylin_property_file=${KYLIN_HOME}/conf/kylin.properties
export "$1"="$(parse_properties "${kylin_property_file}" "$1" "$2" )"
}

# Example from setenv.sh - how the override of the variable KYLIN_JVM_SETTINGS:

Instead of:

 export KYLIN_JVM_SETTINGS=""

Use:

 export_property_override KYLIN_JVM_SETTINGS ""

I.e:

export_property_override KYLIN_JVM_SETTINGS "-Xms1024M -Xmx4096 
-XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"

Then in the conf/kylin.properties you can put line:

KYLIN_JVM_SETTINGS="-Xms1024M -Xmx16g -XX:MaxPermSize=512M -verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"

Which will override the default value provided in setenv.sh

Tested and working. Enjoy!



> Put KYLIN_JVM_SETTINGS to kylin.properties
> --
>
> Key: KYLIN-1830
> URL: https://issues.apache.org/jira/browse/KYLIN-1830
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Richard 

[jira] [Commented] (KYLIN-1830) Put KYLIN_JVM_SETTINGS to kylin.properties

2016-07-01 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359948#comment-15359948
 ] 

Richard Calaba commented on KYLIN-1830:
---

Okay - fair enough - but I prefer having configuration parameters for sizing in 
one location (/conf ) as we there other performance and execution env. related 
settings.

Thus I have found for you a generic solution you can use for ALL variables in 
setenv.sh - using this solution you can override the parameters from setenv.sh 
by values specified in the conf/kylin.properties file using same name of the 
environment variable.

See attached & updated setenv.sh and kylin.properties files (both from Kylin 
1.5.2.1).

 Background info about implementation:

The trick is that I have defined 2 bash functions which try to read the 
property value override from conf/kylin.properties - and if found - it uses the 
value specified there ; if not found (or the value override is commented by #) 
then it uses the default value as specified in the bin/setenv.sh script.

Feel free to include it to the standard packaging - was thinking to provide 
patch here - but didn't find setenv.sh in source code - probably generated ?

Functions Defined:

function parse_properties() {
local param_value=$(awk -F "=" "(!/^(\$|[[:space:]]*#)/) && (/$2/) { 
idx = index(\$0,\"=\"); print substr(\$0,idx+1)}" "$1" | sed -e 's/^"//'  -e 
's/"$//')
if [[ -z "${param_value}" ]]; then
echo $3
else
echo `eval echo "${param_value}"`
fi
}

function export_property_override() {
# default Kylin property file location - for environment values override
local kylin_property_file=${KYLIN_HOME}/conf/kylin.properties
export "$1"="$(parse_properties "${kylin_property_file}" "$1" "$2" )"
}

# Example from setenv.sh - how the override of the variable KYLIN_JVM_SETTINGS:

Instead of:

 export KYLIN_JVM_SETTINGS=""

Use:

 export_property_override KYLIN_JVM_SETTINGS ""

I.e:

export_property_override KYLIN_JVM_SETTINGS "-Xms1024M -Xmx4096 
-XX:MaxPermSize=512M -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"

Then in the conf/kylin.properties you can put line:

KYLIN_JVM_SETTINGS="-Xms1024M -Xmx16g -XX:MaxPermSize=512M -verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"

Which will override the default value provided in setenv.sh

Tested and working. Enjoy!



> Put KYLIN_JVM_SETTINGS to kylin.properties
> --
>
> Key: KYLIN-1830
> URL: https://issues.apache.org/jira/browse/KYLIN-1830
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Richard Calaba
>Priority: Minor
> Attachments: kylin.properties, setenv.sh
>
>
> Currently is the KYLIN_JVM_SETTINGS variable stored in the ,/bin/setenv.sh 
> ... which is not wrong, but as we have also some other memory specific 
> setting in ./conf/kylin.properties file (like i.e 
> kylin.job.mapreduce.default.reduce.input.mb or kylin.table.snapshot.max_mb) 
> it might be good idea to have those performance and sizing related parameters 
> in one location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1830) Put KYLIN_JVM_SETTINGS to kylin.properties

2016-07-01 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1830:
--
Attachment: setenv.sh
kylin.properties

> Put KYLIN_JVM_SETTINGS to kylin.properties
> --
>
> Key: KYLIN-1830
> URL: https://issues.apache.org/jira/browse/KYLIN-1830
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Richard Calaba
>Priority: Minor
> Attachments: kylin.properties, setenv.sh
>
>
> Currently is the KYLIN_JVM_SETTINGS variable stored in the ,/bin/setenv.sh 
> ... which is not wrong, but as we have also some other memory specific 
> setting in ./conf/kylin.properties file (like i.e 
> kylin.job.mapreduce.default.reduce.input.mb or kylin.table.snapshot.max_mb) 
> it might be good idea to have those performance and sizing related parameters 
> in one location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)

2016-06-30 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1835:
--
Priority: Minor  (was: Critical)

> Error: java.lang.NumberFormatException: For input count_distinct on Big Int 
> ??? (#7 Step Name: Build Base Cuboid Data)
> --
>
> Key: KYLIN-1835
> URL: https://issues.apache.org/jira/browse/KYLIN-1835
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Minor
>
> I believe I have discovered an error in Kylin realted to count_distinc with 
> exact precission.
> I am not 100% sure - but all points to the fact tha there is a design limit 
> for count_distinct ... please assess / confirm / reject my observation.
> Background info:
> =
> - large fact table ~ 100 mio rows.
> - large customer dimension ~ 10 mio rows
> Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type 
> bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for 
> one measure max 15 000 000 distinct values ; 2nd measure can have more 
> distinct values ~ approx. 50 mil (just an estimate). 
> Error info:
> 
> Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it 
> errors out without further details in Kylin Log - it shows only "no counters 
> for job job_1463699962519_16085".
> The MR Logs of the job job_1463699962519_16085 sow exceptions:
> 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.NumberFormatException: For input string: 
> "-6628245177096591402"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:495)
>   at java.lang.Integer.parseInt(Integer.java:527)
>   at 
> org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206)
>   at 
> org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Just reading the signature of the exception and connecting the Measure 
> precision return type "bitmap" => looks like that because I have chosen exact 
> precision (which on UI says supported for int types) is causing this 
> exception because I am passing Bigint field  
> If so -> is that a bug (refactory for big int needed) or is it design 
> limitation ??? Cannot be the count_distinct implemented for bigint (with 
> exact precision) or do I have to use count_distinct with error rate instead 
> ???
> In case I do not need to calculate the count_distinct for all dimensions 
> combinations -  I might add some mandatory dimensions to the aggregation 
> group - but not sure if this would resolve this issue (assuming I keep the 
> exact precision counts) ... ???



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)

2016-06-30 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356743#comment-15356743
 ] 

Richard Calaba commented on KYLIN-1835:
---

Confirmed - after changing the count_distinct from 'bitmap' (precisely) to 
'hllc16' (Error Rate < 1.22%)  - the error is no more appearing.


So final question - is the Bigint not supported for count_distinct ??? 

Lowering priority due to resolution ... 

> Error: java.lang.NumberFormatException: For input count_distinct on Big Int 
> ??? (#7 Step Name: Build Base Cuboid Data)
> --
>
> Key: KYLIN-1835
> URL: https://issues.apache.org/jira/browse/KYLIN-1835
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Critical
>
> I believe I have discovered an error in Kylin realted to count_distinc with 
> exact precission.
> I am not 100% sure - but all points to the fact tha there is a design limit 
> for count_distinct ... please assess / confirm / reject my observation.
> Background info:
> =
> - large fact table ~ 100 mio rows.
> - large customer dimension ~ 10 mio rows
> Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type 
> bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for 
> one measure max 15 000 000 distinct values ; 2nd measure can have more 
> distinct values ~ approx. 50 mil (just an estimate). 
> Error info:
> 
> Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it 
> errors out without further details in Kylin Log - it shows only "no counters 
> for job job_1463699962519_16085".
> The MR Logs of the job job_1463699962519_16085 sow exceptions:
> 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.NumberFormatException: For input string: 
> "-6628245177096591402"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:495)
>   at java.lang.Integer.parseInt(Integer.java:527)
>   at 
> org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206)
>   at 
> org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Just reading the signature of the exception and connecting the Measure 
> precision return type "bitmap" => looks like that because I have chosen exact 
> precision (which on UI says supported for int types) is causing this 
> exception because I am passing Bigint field  
> If so -> is that a bug (refactory for big int needed) or is it design 
> limitation ??? Cannot be the count_distinct implemented for bigint (with 
> exact precision) or do I have to use count_distinct with error rate instead 
> ???
> In case I do not need to calculate the count_distinct for all dimensions 
> combinations -  I might add some mandatory dimensions to the aggregation 
> group - but not sure if this would resolve this issue (assuming I keep the 
> exact precision counts) ... ???



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-06-30 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356723#comment-15356723
 ] 

Richard Calaba commented on KYLIN-1834:
---

I have removed the count_distinct measures to make sure this is not the cause. 
The error is still there.

Further I found out:

1) The missing value is complaining about a field customer_id - which exists in 
the FACT table (as FK) and in LOOKUP table as PK - LEFT OUTER JOIN
2) However there is 2nd dimension/LOOKUP - also LEFT OUTER JOIN - which is 
using 2 fields for the join (date and customer_id) - in this 2nd lookup table 
the value (for customer_id field) which is the exception complaining about - 
DOESN'T EXIST.

However as it is LEFT OUTER JOIN - it doesn't have to ... 

To me that almotst looks like a bug in Kylin code ... is the dictionary build 
incorrectly because I have the field used in 2 LOOKUPs ???


> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Critical
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build 

[jira] [Updated] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-06-30 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1834:
--
Priority: Blocker  (was: Critical)

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Blocker
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected 
> dimension/row key Encoding from "dict" to "int; length=8" on the Advanced 
> Settings of the Cube.
> ==
> We have 2 high-cardinality fields (one from fact table and one from the big 
> dimension (customer - see above). We need to use in distinc_count 

[jira] [Updated] (KYLIN-1837) Feature request - cross cube reuse of Kylin fact/lookup snapshots ...

2016-06-28 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1837:
--
Description: 
Hello Kylin gurus,

while debugging some issues with high cardinality dimensions - which obviously 
requires large data to be processed to emulate the problem thus the Cube Build 
process takes significant time ... I came to this idea:

- Cannot be the Snapshot logic - be resued cross cubes ??
- Let's say I have cube 1 and cube 2 which is clone of cube 1 maybe with 
removed some dimnesions or even having same dimensions and just having 
different measures definition ... 
- Cube 1 build fails somewhere in later steps (snaphost already built) in step 
1 I believe 
- Running build of 2nd cube - which let's say is using exactly same dimensions 
table and in fact also same fact table - this also requires long run because in 
the Step 1 the build process is calculating the snaphots ... which are already 
calculated (and still not discared) by the Build Job of Cube 1 

Is there any chance to define some snapshots reuse scenarios like that (same 
model/DB tables referred) ... so the modelling  time can be shortened 
while playing with the cube design ??? (i.e. testing various optimizations like 
joint dimensions, etc ...- those should not be impacted by the source data 
stored in the alread calculated snapshots, right ?

Obviously that should be an option while scheduling Cube Build to 
enable/disable reuse of snapshots from other similar cubes.

  was:
Hello Kylin gurus,

while debugging some issues with high cardinality dimensions - which obviously 
requires large data to be processed to emulate the problem thus the Cube Build 
process takes significant time ... I came to this idea:

- Cannot be the Snapshot logic - be resued cross cubes ??
- Let's say I have cube 1 and cube 2 which is clone of cube 1 maybe with 
removed some dimnesions or even having same dimensions and just having 
different measures definition ... 
- Cube 1 build fails somewhere in later steps (snaphost already built) in step 
1 I believe 
- Running build of 2nd cube - which let's say is using exactly same dimensions 
table and in fact also same fact table - this also requires long run because in 
the Step 1 the build process is calculating the snaphots ... which are already 
calculated (and still not discared) by the Build Job of Cube 1 

Is there any chance to define some snapshots reuse scenarios like that (same 
model/DB tables referred) ... so the modelling  time can be shortened 
while playing with the cube design ??? (i.e. testing various optimizations like 
joint dimensions, etc ...- those should not be impacted by the source data 
stored in the ealready calculated snapshots, right ?


> Feature request - cross cube reuse of Kylin fact/lookup snapshots ...
> -
>
> Key: KYLIN-1837
> URL: https://issues.apache.org/jira/browse/KYLIN-1837
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: all
>Reporter: Richard Calaba
>Assignee: Dong Li
>
> Hello Kylin gurus,
> while debugging some issues with high cardinality dimensions - which 
> obviously requires large data to be processed to emulate the problem thus the 
> Cube Build process takes significant time ... I came to this idea:
> - Cannot be the Snapshot logic - be resued cross cubes ??
> - Let's say I have cube 1 and cube 2 which is clone of cube 1 maybe with 
> removed some dimnesions or even having same dimensions and just having 
> different measures definition ... 
> - Cube 1 build fails somewhere in later steps (snaphost already built) in 
> step 1 I believe 
> - Running build of 2nd cube - which let's say is using exactly same 
> dimensions table and in fact also same fact table - this also requires long 
> run because in the Step 1 the build process is calculating the snaphots ... 
> which are already calculated (and still not discared) by the Build Job of 
> Cube 1 
> Is there any chance to define some snapshots reuse scenarios like that (same 
> model/DB tables referred) ... so the modelling  time can be shortened 
> while playing with the cube design ??? (i.e. testing various optimizations 
> like joint dimensions, etc ...- those should not be impacted by the source 
> data stored in the alread calculated snapshots, right ?
> Obviously that should be an option while scheduling Cube Build to 
> enable/disable reuse of snapshots from other similar cubes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1837) Feature request - cross cube reuse of Kylin fact/lookup snapshots ...

2016-06-28 Thread Richard Calaba (JIRA)
Richard Calaba created KYLIN-1837:
-

 Summary: Feature request - cross cube reuse of Kylin fact/lookup 
snapshots ...
 Key: KYLIN-1837
 URL: https://issues.apache.org/jira/browse/KYLIN-1837
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: all
Reporter: Richard Calaba
Assignee: Dong Li


Hello Kylin gurus,

while debugging some issues with high cardinality dimensions - which obviously 
requires large data to be processed to emulate the problem thus the Cube Build 
process takes significant time ... I came to this idea:

- Cannot be the Snapshot logic - be resued cross cubes ??
- Let's say I have cube 1 and cube 2 which is clone of cube 1 maybe with 
removed some dimnesions or even having same dimensions and just having 
different measures definition ... 
- Cube 1 build fails somewhere in later steps (snaphost already built) in step 
1 I believe 
- Running build of 2nd cube - which let's say is using exactly same dimensions 
table and in fact also same fact table - this also requires long run because in 
the Step 1 the build process is calculating the snaphots ... which are already 
calculated (and still not discared) by the Build Job of Cube 1 

Is there any chance to define some snapshots reuse scenarios like that (same 
model/DB tables referred) ... so the modelling  time can be shortened 
while playing with the cube design ??? (i.e. testing various optimizations like 
joint dimensions, etc ...- those should not be impacted by the source data 
stored in the ealready calculated snapshots, right ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)

2016-06-28 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353824#comment-15353824
 ] 

Richard Calaba commented on KYLIN-1835:
---

After additional analysis - I believe it is related to high-cardinality 
field/dimension in the fact table which is required to calculate 
distinct_counts on ... The bitmap encoding ensuring the exact precision of the 
distinct_counts (claiming support for integer types) seems to work for integers 
probably smaller than 4bytes - the Bigint is causing issue. 

I am testing this theory by changing the count_distinct precision to allow some 
error rate (<1.22%) ...

Can anyone confirm/reject my observations/conclusions meanwhile ??? The cube 
rebuidl process will take several hours ...

> Error: java.lang.NumberFormatException: For input count_distinct on Big Int 
> ??? (#7 Step Name: Build Base Cuboid Data)
> --
>
> Key: KYLIN-1835
> URL: https://issues.apache.org/jira/browse/KYLIN-1835
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Critical
>
> I believe I have discovered an error in Kylin realted to count_distinc with 
> exact precission.
> I am not 100% sure - but all points to the fact tha there is a design limit 
> for count_distinct ... please assess / confirm / reject my observation.
> Background info:
> =
> - large fact table ~ 100 mio rows.
> - large customer dimension ~ 10 mio rows
> Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type 
> bitmap) on 2 high-cardinality fields of type Bigint (# of values expected for 
> one measure max 15 000 000 distinct values ; 2nd measure can have more 
> distinct values ~ approx. 50 mil (just an estimate). 
> Error info:
> 
> Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it 
> errors out without further details in Kylin Log - it shows only "no counters 
> for job job_1463699962519_16085".
> The MR Logs of the job job_1463699962519_16085 sow exceptions:
> 2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.NumberFormatException: For input string: 
> "-6628245177096591402"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:495)
>   at java.lang.Integer.parseInt(Integer.java:527)
>   at 
> org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106)
>   at 
> org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159)
>   at 
> org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206)
>   at 
> org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Just reading the signature of the exception and connecting the Measure 
> precision return type "bitmap" => looks like that because I have chosen exact 
> precision (which on UI says supported for int types) is causing this 
> exception because I am passing Bigint field  
> If so -> is that a bug (refactory for big int needed) or is it design 
> limitation ??? Cannot be the count_distinct implemented for bigint (with 
> exact precision) or do I have to use count_distinct with error rate instead 
> ???
> In case I do not need to calculate the count_distinct for all dimensions 
> combinations -  I might add some mandatory dimensions to the aggregation 
> group - but not sure if this would resolve this issue (assuming I keep the 
> exact precision counts) ... ???



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-06-28 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353818#comment-15353818
 ] 

Richard Calaba commented on KYLIN-1834:
---

Additional info:

I further suspect that this exception is rooted in the problem that we are 
trying to use high cardinality dimension (customer) with approx 10 milion 
values and customer_id defined as Bigint.

Seems that the Kylin engine is trying to build dictionary for this dimension 
and this is failing  we also defined count_distinct measure for this 
dimension with exact precision which needs to use bitmap encoding (which is 
supported for int types but has some issue - see my 
https://issues.apache.org/jira/browse/KYLIN-1835 - seems we might need to 
switch to caount_distinct calculations with some error rate expected (not using 
exatc precision which requires bitmap)).

So the main question is - in case of 10 mio values (encoded in Bigint) - can we 
built Dictionary, which method to use ??? 
Do we have to built dictionary for high cardinality dimensions if the only 
thing we need is count_distinct measure ?? We do not need group by and where 
conditions for high cardinality dimension ...
 

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Critical
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> 

[jira] [Updated] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements

2016-06-28 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1836:
--
Description: 
After reading the Tech Blog - 
https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin Ma 
- I got few ideas mentioned below - to help the Cube designers understand 
impact of their cube design on the Build and Query performance - see below:

BTW: hank you for putting this Blog together !!! and thank you for referencing 
this blog through Kylin UI - link in the Aggregation Groups section !! - it is 
very powerful optimization technique.)

Idea 1
=

 It would be great if the Advanced Settings section on UI can calculate the 
exact number of Cuboids defined by every Aggregation Group (# of combinations ; 
# of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and 
then also showing the overall total of Cuboids considering ALL the defined 
Aggregation Groups.

Idea 2
=

As Aggregation Group section is about optimizing # of necessary cuboids 
assuming you know the queries patterns. This is sometimes easy but for more 
complex dashboards where multiple people work on defining the queries this is 
hard to control and guess, thus I would suggest adding a new Tab in the Monitor 
Kylin UI - next to Job and Slow Queries add additional tab "Non-satisfied 
Queries" showing the Queries which were not able to be evaluated by Kylin - 
queries which end with "No Realization" exception. Together with the Query SQL 
(including all the parameters) it would help to show the "missing dimension 
name" used in the query which was the cause for not finding proper Cuboid.


Idea 3
=
Can anyone also document the section Rowkeys in the same section of UI 
(Advanced Settings) ??? It is not really clear what effect will have if I start 
playing with the Rowkeys section (adding/removing dimension fields; adding 
non-dimension fields, ...). All I understand is that the "Rowkeys" section has 
impact only on HBase storage of calculated cuboids. Thus doesn't have impact on 
Cube Build time that much (except the impact that the Trie for dictionary needs 
to be built for every specified rowkey on this tab). I understand that the 
major impact of Rowkeys section is thus only on HBase size / regions split and 
thus also on the Query execution time. 

What I am confused with is whether I can define high-cardinality dimension in 
Cube and remove it from the Rowkeys section ??? What would happen in HBase 
storage and expected Query time ...would that dimension be still query-enabled 
??

The closest explanation I found is this Reply from - Yu Feng's here 
http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html
==
Reply: Cube size determines how to split region for table in hbase after 
generate 
all cuboid files, for example, If all of your cuboid file size is 100GB, 
your  cube size set to "SMALL", and the property for SMALL is 10GB, kylin 
will create hbase table with 10 regions. it will calculate every start 
rowkey and end rowkey of every region before create htable. then create 
table with those split infomations. 

Rowkey column length is another thing, you can choose either use dictionary 
or set rowkey column length for every dimension , If you use dictionary, 
kylin will build dictionary for this column(Trie tree), it means every 
value of the dimension will be encoded as a unique number value, because 
dimension value is a part of hbase rowkey, so it will reduce hbase table 
size with dictionary. However, kylin store the dictionary in memory, if 
dimension cardinality is large, It will become something bad. If you set rowkey 
column length to N for one dimension, kylin will not build dictionary for 
it, and every value will be cutted to a N-length string, so, no dictionary 
in memory, rowkey in hbase table will be longer. 
==

Additional - verly light explanation on the Rowkeys section is here: 
https://kylin.apache.org/docs15/tutorial/create_cube.html
=
Rowkeys: the rowkeys are composed by the dimension encoded values. “Dictionary” 
is the default encoding method; If a dimension is not fit with dictionary 
(e.g., cardinality > 10 million), select “false” and then enter the fixed 
length for that dimension, usually that is the max. length of that column; if a 
value is longer than that size it will be truncated. Please note, without 
dictionary encoding, the cube size might be much bigger.

You can drag & drop a dimension column to adjust its position in rowkey; Put 
the mandantory dimension at the begining, then followed the dimensions that 
heavily involved in filters (where condition). Put high cardinality dimensions 
ahead of low cardinality dimensions.

I.e. 

[jira] [Updated] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements

2016-06-28 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1836:
--
Description: 
After reading the Tech Blog - 
https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin Ma 
- I got few ideas mentioned below - to help the Cube designers understand 
impact of their cube design on the Build and Query performance - see below:

BTW: hank you for putting this Blog together !!! and thank you for referencing 
this blog through Kylin UI - link in the Aggregation Groups section !! - it is 
very powerful optimization technique.)

Idea 1
=

 It would be great if the Advanced Settings section on UI can calculate the 
exact number of Cuboids defined by every Aggregation Group (# of combinations ; 
# of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and 
then also showing the overall total of Cuboids considering ALL the defined 
Aggregation Groups.

Idea 2
=

As Aggregation Group section is about optimizing # of necessary cuboids 
assuming you know the queries patterns. This is sometimes easy but for more 
complex dashboards where multiple people work on defining the queries this is 
hard to control and guess, thus I would suggest adding a new Tab in the Monitor 
Kylin UI - next to Job and Slow Queries add additional tab "Non-satisfied 
Queries" showing the Queries which were not able to be evaluated by Kylin - 
queries which end with "No Realization" exception. Together with the Query SQL 
(including all the parameters) it would help to show the "missing dimension 
name" used in the query which was the cause for not finding proper Cuboid.


Idea 3
=
Can anyone also document the section Rowkeys in the same section of UI 
(Advanced Settings) ??? It is not really clear what effect will have if I start 
playing with the Rowkeys section (adding/removing dimension fields; adding 
non-dimension fields, ...). All I understand is that the "Rowkeys" section has 
impact only on HBase storage of calculated cuboids. Thus doesn't have impact on 
Cube Build time that much (except the impact that the Trie for dictionary needs 
to be built for every specified rowkey on this tab). I understand that the 
major impact of Rowkeys section is thus only on HBase size / regions split and 
thus also on the Query execution time. 

What I am confused with is whether I can define high-cardinality dimension in 
Cube and remove it from the Rowkeys section ??? What would happen in HBase 
storage and expected Query time ...would that dimension be still query-enabled 
??

The closest explanation I found is this Reply from - Yu Feng's here 
http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html
==
Reply: Cube size determines how to split region for table in hbase after 
generate 
all cuboid files, for example, If all of your cuboid file size is 100GB, 
your  cube size set to "SMALL", and the property for SMALL is 10GB, kylin 
will create hbase table with 10 regions. it will calculate every start 
rowkey and end rowkey of every region before create htable. then create 
table with those split infomations. 

Rowkey column length is another thing, you can choose either use dictionary 
or set rowkey column length for every dimension , If you use dictionary, 
kylin will build dictionary for this column(Trie tree), it means every 
value of the dimension will be encoded as a unique number value, because 
dimension value is a part of hbase rowkey, so it will reduce hbase table 
size with dictionary. However, kylin store the dictionary in memory, if 
dimension cardinality is large, It will become something bad. If you set rowkey 
column length to N for one dimension, kylin will not build dictionary for 
it, and every value will be cutted to a N-length string, so, no dictionary 
in memory, rowkey in hbase table will be longer. 
==

Additional - verly light explanation on the Rowkeys section is here: 
https://kylin.apache.org/docs15/tutorial/create_cube.html
=
Rowkeys: the rowkeys are composed by the dimension encoded values. “Dictionary” 
is the default encoding method; If a dimension is not fit with dictionary 
(e.g., cardinality > 10 million), select “false” and then enter the fixed 
length for that dimension, usually that is the max. length of that column; if a 
value is longer than that size it will be truncated. Please note, without 
dictionary encoding, the cube size might be much bigger.

You can drag & drop a dimension column to adjust its position in rowkey; Put 
the mandantory dimension at the begining, then followed the dimensions that 
heavily involved in filters (where condition). Put high cardinality dimensions 
ahead of low cardinality dimensions.

  

[jira] [Updated] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvement

2016-06-28 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1836:
--
Description: 
After reading the Tech Blog - 
https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin Ma 
- I got few ideas mentioned below - to help the Cube designers understand 
impact of their cube design on the Build and Query performance - see below:

BTW: hank you for putting this Blog together !!! and thank you for referencing 
this blog through Kylin UI - link in the Aggregation Groups section !! - it is 
very powerful optimization technique.)

Idea 1
=

 It would be great if the Advanced Settings section on UI can calculate the 
exact number of Cuboids defined by every Aggregation Group (# of combinations ; 
# of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and 
then also showing the overall total of Cuboids considering ALL the defined 
Aggregation Groups.

Idea 2
=

As Aggregation Group section is about optimizing # of necessary cuboids 
assuming you know the queries patterns. This is sometimes easy but for more 
complex dashboards where multiple people work on defining the queries this is 
hard to control and guess, thus I would suggest adding a new Tab in the Monitor 
Kylin UI - next to Job and Slow Queries add additional tab "Non-satisfied 
Queries" showing the Queries which were not able to be evaluated by Kylin - 
queries which end with "No Realization" exception. Together with the Query SQL 
(including all the parameters) it would help to show the "missing dimension 
name" used in the query which was the cause for not finding proper Cuboid.


Idea 3
=
Can anyone also document the section Rowkeys in the same section of UI 
(Advanced Settings) ??? It is not really clear what effect will have if I start 
playing with the Rowkeys section (adding/removing dimension fields; adding 
non-dimension fields, ...). All I understand is that the "Rowkeys" section has 
impact only on HBase storage of calculated cuboids. Thus doesn't have impact on 
Cube Build time that much (except the impact that the Trie for dictionary needs 
to be built for every specified rowkey on this tab). I understand that the 
major impact of Rowkeys section is thus only on HBase size / regions split and 
thus also on the Query execution time. 

What I am confused with is whether I can define high-cardinality dimension in 
Cube and remove it from the Rowkeys section ??? What would happen in HBase 
storage and expected Query time ...would that dimension be still query-enabled 
??

The closest explanation I found is this Reply from - Yu Feng's here 
http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html
==
Reply: Cube size determines how to split region for table in hbase after 
generate 
all cuboid files, for example, If all of your cuboid file size is 100GB, 
your  cube size set to "SMALL", and the property for SMALL is 10GB, kylin 
will create hbase table with 10 regions. it will calculate every start 
rowkey and end rowkey of every region before create htable. then create 
table with those split infomations. 

Rowkey column length is another thing, you can choose either use dictionary 
or set rowkey column length for every dimension , If you use dictionary, 
kylin will build dictionary for this column(Trie tree), it means every 
value of the dimension will be encoded as a unique number value, because 
dimension value is a part of hbase rowkey, so it will reduce hbase table 
size with dictionary. However, kylin store the dictionary in memory, if 
dimension cardinality is large, It will become something bad. If you set rowkey 
column length to N for one dimension, kylin will not build dictionary for 
it, and every value will be cutted to a N-length string, so, no dictionary 
in memory, rowkey in hbase table will be longer. 
==

  was:
After reading the Tech Blog - 
https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin Ma 
- I got few ideas mentioned below - to help the Cube designers understand 
impact of their cube design on the Build and Query performance - see below:

BTW: hank you for putting this Blog together !!! and thank you for referencing 
this blog through Kylin UI - link in the Aggregation Groups section !! - it is 
very powerful optimization technique.)

Idea 1
=

 It would be great if the Advanced Settings section on UI can calculate the 
exact number of Cuboids defined by every Aggregation Group (# of combinations ; 
# of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and 
then also showing the overall total of Cuboids considering ALL the defined 
Aggregation Groups.

Idea 2
=

As Aggregation Group section is about optimizing # of necessary cuboids 

[jira] [Updated] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvements

2016-06-28 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1836:
--
Summary: Kylin 1.5+ New Aggregation Group - UI improvements  (was: Kylin 
1.5+ New Aggregation Group - UI improvement)

> Kylin 1.5+ New Aggregation Group - UI improvements
> --
>
> Key: KYLIN-1836
> URL: https://issues.apache.org/jira/browse/KYLIN-1836
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v1.5.0, v1.5.1, v1.5.2, v1.5.3, v1.5.2.1
>Reporter: Richard Calaba
>
> After reading the Tech Blog - 
> https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin 
> Ma - I got few ideas mentioned below - to help the Cube designers understand 
> impact of their cube design on the Build and Query performance - see below:
> BTW: hank you for putting this Blog together !!! and thank you for 
> referencing this blog through Kylin UI - link in the Aggregation Groups 
> section !! - it is very powerful optimization technique.)
> Idea 1
> =
>  It would be great if the Advanced Settings section on UI can calculate the 
> exact number of Cuboids defined by every Aggregation Group (# of combinations 
> ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and 
> then also showing the overall total of Cuboids considering ALL the defined 
> Aggregation Groups.
> Idea 2
> =
> As Aggregation Group section is about optimizing # of necessary cuboids 
> assuming you know the queries patterns. This is sometimes easy but for more 
> complex dashboards where multiple people work on defining the queries this is 
> hard to control and guess, thus I would suggest adding a new Tab in the 
> Monitor Kylin UI - next to Job and Slow Queries add additional tab 
> "Non-satisfied Queries" showing the Queries which were not able to be 
> evaluated by Kylin - queries which end with "No Realization" exception. 
> Together with the Query SQL (including all the parameters) it would help to 
> show the "missing dimension name" used in the query which was the cause for 
> not finding proper Cuboid.
> Idea 3
> =
> Can anyone also document the section Rowkeys in the same section of UI 
> (Advanced Settings) ??? It is not really clear what effect will have if I 
> start playing with the Rowkeys section (adding/removing dimension fields; 
> adding non-dimension fields, ...). All I understand is that the "Rowkeys" 
> section has impact only on HBase storage of calculated cuboids. Thus doesn't 
> have impact on Cube Build time that much (except the impact that the Trie for 
> dictionary needs to be built for every specified rowkey on this tab). I 
> understand that the major impact of Rowkeys section is thus only on HBase 
> size / regions split and thus also on the Query execution time. 
> What I am confused with is whether I can define high-cardinality dimension in 
> Cube and remove it from the Rowkeys section ??? What would happen in HBase 
> storage and expected Query time ...would that dimension be still 
> query-enabled ??
> The closest explanation I found is this Reply from - Yu Feng's here 
> http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html
> ==
> Reply: Cube size determines how to split region for table in hbase after 
> generate 
> all cuboid files, for example, If all of your cuboid file size is 100GB, 
> your  cube size set to "SMALL", and the property for SMALL is 10GB, kylin 
> will create hbase table with 10 regions. it will calculate every start 
> rowkey and end rowkey of every region before create htable. then create 
> table with those split infomations. 
> Rowkey column length is another thing, you can choose either use dictionary 
> or set rowkey column length for every dimension , If you use dictionary, 
> kylin will build dictionary for this column(Trie tree), it means every 
> value of the dimension will be encoded as a unique number value, because 
> dimension value is a part of hbase rowkey, so it will reduce hbase table 
> size with dictionary. However, kylin store the dictionary in memory, if 
> dimension cardinality is large, It will become something bad. If you set 
> rowkey 
> column length to N for one dimension, kylin will not build dictionary for 
> it, and every value will be cutted to a N-length string, so, no dictionary 
> in memory, rowkey in hbase table will be longer. 
> ==



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1836) Kylin 1.5+ New Aggregation Group - UI improvement

2016-06-28 Thread Richard Calaba (JIRA)
Richard Calaba created KYLIN-1836:
-

 Summary: Kylin 1.5+ New Aggregation Group - UI improvement
 Key: KYLIN-1836
 URL: https://issues.apache.org/jira/browse/KYLIN-1836
 Project: Kylin
  Issue Type: Improvement
Affects Versions: v1.5.2, v1.5.1, v1.5.0, v1.5.3, v1.5.2.1
Reporter: Richard Calaba


After reading the Tech Blog - 
https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin Ma 
- I got few ideas mentioned below - to help the Cube designers understand 
impact of their cube design on the Build and Query performance - see below:

BTW: hank you for putting this Blog together !!! and thank you for referencing 
this blog through Kylin UI - link in the Aggregation Groups section !! - it is 
very powerful optimization technique.)

Idea 1
=

 It would be great if the Advanced Settings section on UI can calculate the 
exact number of Cuboids defined by every Aggregation Group (# of combinations ; 
# of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and 
then also showing the overall total of Cuboids considering ALL the defined 
Aggregation Groups.

Idea 2
=

As Aggregation Group section is about optimizing # of necessary cuboids 
assuming you know the queries patterns. This is sometimes easy but for more 
complex dashboards where multiple people work on defining the queries this is 
hard to control and guess, thus I would suggest adding a new Tab in the Monitor 
Kylin UI - next to Job and Slow Queries add additional tab "Non-satisfied 
Queries" showing the Queries which were not able to be evaluated by Kylin - 
queries which end with "No Realization" exception. Together with the Query SQL 
(including all the parameters) it would help to show the "missing dimension 
name" used in the query which was the cause for not finding proper Cuboid.


Idea 3
=
Can anyone also document the section Rowkeys in the same section of UI 
(Advanced Settings) ??? It is not really clear what effect will have if I start 
playing with the Rowkeys section (adding/removing dimension fields; adding 
non-dimension fields, ...). All I understand is that the "Rowkeys" section has 
impact only on HBase storage of calculated cuboids. Thus doesn't have impact on 
Cbe Build time that much (only that Trie for dictionary needs to be bulit for 
every specified rowkey) - major impact it hase on HBase size / regions split 
and thus also Query time. 

What I am for example confused with is if I can define high-cardinality 
dimension in Cube and remove it from the Rowkeys section ??? What would happen 
in HBase storage and expected Query time ...

The closest explanation I fond is this from - Yu Feng's reply 
--http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html
==Reply: Cube size 
determines how to split region for table in hbase after generate 
all cuboid files, for example, If all of your cuboid file size is 100GB, 
your  cube size set to "SMALL", and the property for SMALL is 10GB, kylin 
will create hbase table with 10 regions. it will calculate every start 
rowkey and end rowkey of every region before create htable. then create 
table with those split infomations. 

Rowkey column length is another thing, you can choose either use dictionary 
or set rowkey column length for every dimension , If you use dictionary, 
kylin will build dictionary for this column(Trie tree), it means every 
value of the dimension will be encoded as a unique number value, because 
dimension value is a part of hbase rowkey, so it will reduce hbase table 
size with dictionary. However, kylin store the dictionary in memory, if 
dimension cardinality is large, It will become something bad. If you set rowkey 
column length to N for one dimension, kylin will not build dictionary for 
it, and every value will be cutted to a N-length string, so, no dictionary 
in memory, rowkey in hbase table will be longer. 
==



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1835) Error: java.lang.NumberFormatException: For input count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)

2016-06-28 Thread Richard Calaba (JIRA)
Richard Calaba created KYLIN-1835:
-

 Summary: Error: java.lang.NumberFormatException: For input 
count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)
 Key: KYLIN-1835
 URL: https://issues.apache.org/jira/browse/KYLIN-1835
 Project: Kylin
  Issue Type: Bug
Affects Versions: v1.5.2, v1.5.2.1
Reporter: Richard Calaba
Priority: Critical


I believe I have discovered an error in Kylin realted to count_distinc with 
exact precission.

I am not 100% sure - but all points to the fact tha there is a design limit dor 
count_distinct ... please assess / confirm / reject my observation.

Background info:
=
- large fact table ~ 100 mio rows.
- large customer dimension ~ 10 mio rows

Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type 
bitma) on 2 high-cardinality fields of type Bigint

Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it 
errors out without further details in Kylin Log - it shows only "no counters 
for job job_1463699962519_16085".

The MR Logs of the job job_1463699962519_16085 sow exceptions:

2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.NumberFormatException: For input string: 
"-6628245177096591402"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:495)
at java.lang.Integer.parseInt(Integer.java:527)
at 
org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63)
at 
org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106)
at 
org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98)
at 
org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189)
at 
org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159)
at 
org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206)
at 
org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Just reading the signature of the exception and connecting the Measure 
precision return type "bitmap" => looks like that because I have chosen exact 
precision (which on UI says supported for int types) is causing this exception 
because I am passing Bigint field  


If so -> is that a bug or design limitation ??? Cannot be the count_distinct 
implemented for bigint (with exact precision) or do I have to use 
count_distinct with error rate instead ???





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-06-28 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1834:
--
Attachment: job_2016_06_28_09_59_12-value-not-found.zip

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Critical
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected 
> dimension/row key Encoding from "dict" to "int; length=8" on the Advanced 
> Settings of the Cube.
> ==
> We have 2 high-cardinality fields (one from fact table and one from the big 
> dimension (customer - see above). We need to 

[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-06-28 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353486#comment-15353486
 ] 

Richard Calaba commented on KYLIN-1834:
---

The exception is pre-pend with additional error explaining which value is the 
problem:

2016-06-28 08:07:35,215 ERROR [pool-2-thread-1] dict.TrieDictionary:173 : Not a 
valid value: -4270603867011174754
2016-06-28 08:07:35,220 ERROR [pool-2-thread-1] execution.AbstractExecutable:62 
: error execute 
HadoopShellExecutable{id=9549b14f-d25c-408b-8027-841f4fb94298-03, name=Build 
Dimension Dictionary, state=RUNNING}
java.lang.IllegalArgumentException: Value not exists! 

--- checking now which field it relates to ... 
--- also adding Diagnostic logs collectedh

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Priority: Critical
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> 

[jira] [Updated] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-06-28 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1834:
--
Description: 
Getting exception in Step 4 - Build Dimension Dictionary:

java.lang.IllegalArgumentException: Value not exists!
at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
at 
org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
at 
org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
at 
org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
at 
org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
at 
org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at 
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

result code:2


The code which generates the exception is:

org.apache.kylin.dimension.Dictionary.java:

 /**
 * A lower level API, return ID integer from raw value bytes. In case of 
not found 
 * 
 * - if roundingFlag=0, throw IllegalArgumentException; 
 * - if roundingFlag<0, the closest smaller ID integer if exist; 
 * - if roundingFlag>0, the closest bigger ID integer if exist. 
 * 
 * Bypassing the cache layer, this could be significantly slower than 
getIdFromValue(T value).
 * 
 * @throws IllegalArgumentException
 * if value is not found in dictionary and rounding is off;
 * or if rounding cannot find a smaller or bigger ID
 */
final public int getIdFromValueBytes(byte[] value, int offset, int len, int 
roundingFlag) throws IllegalArgumentException {
if (isNullByteForm(value, offset, len))
return nullId();
else {
int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag);
if (id < 0)
throw new IllegalArgumentException("Value not exists!");
return id;
}
} 

==

The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
mio rows. I have increased the JVM -Xmx to 16gb and set the 
kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
build doesn't fail (previously we were getting exception complaining about the 
300MB limit for Dimension dictionary size (req. approx 700MB)).

==

Before that we were getting exception complaining about the Dictionary encoding 
problem - "Too high cardinality is not suitable for dictionary -- cardinality: 
10873977" - this we resolved by changing the affected dimension/row key 
Encoding from "dict" to "int; length=8" on the Advanced Settings of the Cube.

==

We have 2 high-cardinality fields (one from fact table and one from the big 
dimension (customer - see above). We need to use in distinc_count measure for 
our calculations. I wonder if this exception Value not found! is somewhat 
related ??? Those count_distinct measures are defined one with return type 
"bitmap" (exact precission - only for Int columns) and 2nd with return type 
"hllc16" (error rate <= 1.22 %)

==

I am looking for any clues to debug the cause of this error and way how to 
circumwent this ... 

  was:
Getting exception in Step 4 - Build Dimension Dictionary:

java.lang.IllegalArgumentException: Value not exists!
at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
at 

[jira] [Updated] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-06-28 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1834:
--
Description: 
Getting exception in Step 4 - Build Dimension Dictionary:

java.lang.IllegalArgumentException: Value not exists!
at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
at 
org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
at 
org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
at 
org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
at 
org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
at 
org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at 
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

result code:2


The code which generates the exception is:

org.apache.kylin.dimension.Dictionary.java:

 /**
 * A lower level API, return ID integer from raw value bytes. In case of 
not found 
 * 
 * - if roundingFlag=0, throw IllegalArgumentException; 
 * - if roundingFlag<0, the closest smaller ID integer if exist; 
 * - if roundingFlag>0, the closest bigger ID integer if exist. 
 * 
 * Bypassing the cache layer, this could be significantly slower than 
getIdFromValue(T value).
 * 
 * @throws IllegalArgumentException
 * if value is not found in dictionary and rounding is off;
 * or if rounding cannot find a smaller or bigger ID
 */
final public int getIdFromValueBytes(byte[] value, int offset, int len, int 
roundingFlag) throws IllegalArgumentException {
if (isNullByteForm(value, offset, len))
return nullId();
else {
int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag);
if (id < 0)
throw new IllegalArgumentException("Value not exists!");
return id;
}
} 

==

The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 
mio rows. I have increased the JVM -Xmx to 16gb and set the 
kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
build doesn't fail (previously we were getting exception complaining about the 
300MB limit for Dimension dictionary size (req. approx 700MB)).

==

Before that we were getting exception complaining about the Dictionary encoding 
problem - "Too high cardinality is not suitable for dictionary -- cardinality: 
10873977" - this we resolved by changing the affected dimension/row key 
Encoding from "dict" to "int; length=8" on the Advanced Settings of the Cube.

==

We have 2 high-cardinality fields (one from fact table and one from the big 
dimension (customer - see above). We need to use in distinc_count measure for 
our calculations. I wonder if this exception Value not found! is somewhat 
related ??? Those count_distinct measures are defined one with return type 
"bitmap" (exact precission - only for Int columns) and 2nd with return type 
"hllc16" (error rate <= 1.22 %)

==

I am looking for any clues to debug the cause of this error and way how to 
circumwent this ... 

  was:
Getting exception in Step 4 - Build Dimension Dictionary:

java.lang.IllegalArgumentException: Value not exists!
at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
at 

[jira] [Updated] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-06-28 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1834:
--
Description: 
Getting exception in Step 4 - Build Dimension Dictionary:

java.lang.IllegalArgumentException: Value not exists!
at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
at 
org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
at 
org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
at 
org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
at 
org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
at 
org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at 
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

result code:2


The code which generates the exception is:

org.apache.kylin.dimension.Dictionary.java:

 /**
 * A lower level API, return ID integer from raw value bytes. In case of 
not found 
 * 
 * - if roundingFlag=0, throw IllegalArgumentException; 
 * - if roundingFlag<0, the closest smaller ID integer if exist; 
 * - if roundingFlag>0, the closest bigger ID integer if exist. 
 * 
 * Bypassing the cache layer, this could be significantly slower than 
getIdFromValue(T value).
 * 
 * @throws IllegalArgumentException
 * if value is not found in dictionary and rounding is off;
 * or if rounding cannot find a smaller or bigger ID
 */
final public int getIdFromValueBytes(byte[] value, int offset, int len, int 
roundingFlag) throws IllegalArgumentException {
if (isNullByteForm(value, offset, len))
return nullId();
else {
int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag);
if (id < 0)
throw new IllegalArgumentException("Value not exists!");
return id;
}
} 

==

The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 
mio rows. I have increased the JVM -Xmx to 16gb and set the 
kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
build doesn't fail (previously we were getting exception complaining about the 
300MB limit for Dimension dictionary size (req. approx 700MB)).

==

Before that we were getting exception complaining about the Dictionary encoding 
problem - "Too high cardinality is not suitable for dictionary -- cardinality: 
10873977" - this we resolved by changing the affected Encoding from "dict" to 
"int; length=8"  

==

Those 2 high cardinality fields (one from fact table and one from the big 
dimension (see above) we need to use in distinc_count measure for our 
calculations. I wonder if this is somewhat related ??? 

==

I am looking for any clues to debug the cause of this error and way how to 
circumwent this ... 

  was:
Getting exception in Step 4 - Build Dimension Dictionary:

java.lang.IllegalArgumentException: Value not exists!
at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
at 
org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
at 

[jira] [Created] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-06-28 Thread Richard Calaba (JIRA)
Richard Calaba created KYLIN-1834:
-

 Summary: java.lang.IllegalArgumentException: Value not exists! - 
in Step 4 - Build Dimension Dictionary
 Key: KYLIN-1834
 URL: https://issues.apache.org/jira/browse/KYLIN-1834
 Project: Kylin
  Issue Type: Bug
Affects Versions: v1.5.2, v1.5.2.1
Reporter: Richard Calaba
Priority: Critical


Getting exception in Step 4 - Build Dimension Dictionary:

java.lang.IllegalArgumentException: Value not exists!
at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
at 
org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
at 
org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
at 
org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
at 
org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
at 
org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at 
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

result code:2


The code which generates the exception is:

org.apache.kylin.dimension.Dictionary.java:

 /**
 * A lower level API, return ID integer from raw value bytes. In case of 
not found 
 * 
 * - if roundingFlag=0, throw IllegalArgumentException; 
 * - if roundingFlag<0, the closest smaller ID integer if exist; 
 * - if roundingFlag>0, the closest bigger ID integer if exist. 
 * 
 * Bypassing the cache layer, this could be significantly slower than 
getIdFromValue(T value).
 * 
 * @throws IllegalArgumentException
 * if value is not found in dictionary and rounding is off;
 * or if rounding cannot find a smaller or bigger ID
 */
final public int getIdFromValueBytes(byte[] value, int offset, int len, int 
roundingFlag) throws IllegalArgumentException {
if (isNullByteForm(value, offset, len))
return nullId();
else {
int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag);
if (id < 0)
throw new IllegalArgumentException("Value not exists!");
return id;
}
} 

==

The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 
mio entries. I have increases the JVM -Xmx to 16gb and set the 
kylin.table.snapshot.max_mb=20148 in kylin.properties to make sure the Cube 
build doesn't fail (previously we were getting exception complaining about the 
300MB limit for Dimension dictionary size (req. approx 700MB)).

==

Before that we were getting exception complaining about the Dictionary encoding 
problem - "Too high cardinality is not suitable for dictionary -- cardinality: 
10873977" - this we resolved by changing the affected Encoding from "dict" to 
"int; length=8"  

==

Those 2 high cardinality fields (one from fact table and one from the big 
dimension (see above) we need to use in distinc_count measure for our 
calculations. I wonder if this is somewhat related ??? 

==

I am looking for any clues to debug the cause of this error and way how to 
circumwent this ... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1829) Add execution of "utility" classes to the System tab of Kylin UI

2016-06-27 Thread Richard Calaba (JIRA)
Richard Calaba created KYLIN-1829:
-

 Summary: Add execution of "utility" classes to the System tab of 
Kylin UI
 Key: KYLIN-1829
 URL: https://issues.apache.org/jira/browse/KYLIN-1829
 Project: Kylin
  Issue Type: Improvement
Reporter: Richard Calaba


There is a bunch of "hidden" and/or semi-documented classes in Kylin engine 
which can be very useful for standard maintenance of healthy Kylin instance - 
it would be very good if those are connected to the System tab of the Kylin UI 
so administrators can run them directly from Kylin UI and collect their 
execution log also there, few candidates:

1) ./bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob
2) ./bin/kylin.sh org.apache.kylin.cube.cli.CubeSignatureRefresher
3) ./bin/metastore.sh for backup/restore actions with support to list the 
content of ./meta_backups directory

In addition I found many Kylin engine classes with main method (thus executable 
tools for Kylin) which might be also good candidates to be integrated here - 
just do not know their functions/parameters. To know I would have to read the 
code of all of them. Maybe the authors of the tools can create official list 
with documentation of supported parameters - and this list/docu can be also 
part of the System tab on UI.

Few other classes I found - executable through kylin.sh - some of those 
probably already connected to UI in 1.5.2.1  (like the diagnostics CLI ..)

But some of those I still do not know what they are for .. though the classes 
having CLI in the name give impression that those should be "stable" internal 
interfaces for Kylin engine.

./bin/kylin.sh org.apache.kylin.tool.CubeMetaExtractor
./bin/kylin.sh org.apache.kylin.tool.DiagnosisInfoCLI
./bin/kylin.sh org.apache.kylin.tool.HBaseUsageExtractor
./bin/kylin.sh org.apache.kylin.tool.JobDiagnosisInfoCLI
./bin/kylin.sh org.apache.kylin.job.hadoop.invertedindex.IICLI - error
./bin/kylin.sh org.apache.kylin.common.KylinVersion - this would be good to 
have on UI for sure -> sometimes I do need to know exact Kylin version
./bin/kylin.sh org.apache.kylin.common.persistence.ResourceTool  - this is 
basically metastore.sh
./bin/kylin.sh org.apache.kylin.cube.cli.DumpDictionaryCLI
./kylin.sh org.apache.kylin.cube.cuboid.CuboidCLI
./kylin.sh org.apache.kylin.query.QueryCli

... and some others 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1828) java.lang.StringIndexOutOfBoundsException in org.apache.kylin.storage.hbase.util.StorageCleanupJob

2016-06-27 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352252#comment-15352252
 ] 

Richard Calaba edited comment on KYLIN-1828 at 6/28/16 3:10 AM:


Additional workaround for those who need to cleanup the hive tables manually:


To remove ALL kylin_intermediate hive tables from default schema I run:

hive -e 'use default;show tables "kylin_intermediate_*";' | xargs -I '{}' hive 
-e 'use default;drop table {}'

Then running the: ./bin/kylin.sh 
org.apache.kylin.storage.hbase.util.StorageCleanupJob 2 times seems did the 
additional cleanups as well ...


was (Author: cal...@gmail.com):
Additional workaround for those who need to cleanup the hive tables manually:


To remove ALL kylin_intermediate hive tables from default schema I run:

hive -e 'use default;show tables "kylin_intermediate_*";' | xargs -I '{}' hive 
-e 'use default;drop table {}'

> java.lang.StringIndexOutOfBoundsException in 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob
> --
>
> Key: KYLIN-1828
> URL: https://issues.apache.org/jira/browse/KYLIN-1828
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.2.1
>Reporter: Richard Calaba
>
> While running storage cleanup job:
> ./bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete 
> true
> I see Hive tables in form 
> kylin_intermediate__1970010100_20160701031500
>  in the defaul schema.
> While running the above storage cleaner (v.1.5.2.1 - all previously built 
> Cubes Disabled & Dropped) I am getting an error:
> 2016-06-27 22:28:08,480 INFO  [main StorageCleanupJob:262]: Remove 
> intermediate hive table with job id fc44da88-cffc-4710-8726-ff910cf83451 with 
> job status ERROR
> usage: StorageCleanupJob
>  -deleteDelete the unused storage
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String 
> index out of range: -2
> at java.lang.String.substring(String.java:1904)
> at 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:269)
> at 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob.run(StorageCleanupJob.java:91)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob.main(StorageCleanupJob.java:308)
> 2016-06-27 22:28:08,486 INFO  [Thread-0 
> HConnectionManager$HConnectionImplementation:1907]: Closing zookeeper 
> sessionid=0x154c97461586119
> 2016-06-27 22:28:08,491 INFO  [Thread-0 ZooKeeper:684]: Session: 
> 0x154c97461586119 closed
> 2016-06-27 22:28:08,491 INFO  [main-EventThread ClientCnxn:509]: EventThread 
> shut down



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1828) java.lang.StringIndexOutOfBoundsException in org.apache.kylin.storage.hbase.util.StorageCleanupJob

2016-06-27 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1828:
--
Component/s: Job Engine

> java.lang.StringIndexOutOfBoundsException in 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob
> --
>
> Key: KYLIN-1828
> URL: https://issues.apache.org/jira/browse/KYLIN-1828
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.2.1
>Reporter: Richard Calaba
>
> While running storage cleanup job:
> ./bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete 
> true
> I see Hive tables in form 
> kylin_intermediate__1970010100_20160701031500
>  in the defaul schema.
> While running the above storage cleaner (v.1.5.2.1 - all previously built 
> Cubes Disabled & Dropped) I am getting an error:
> 2016-06-27 22:28:08,480 INFO  [main StorageCleanupJob:262]: Remove 
> intermediate hive table with job id fc44da88-cffc-4710-8726-ff910cf83451 with 
> job status ERROR
> usage: StorageCleanupJob
>  -deleteDelete the unused storage
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String 
> index out of range: -2
> at java.lang.String.substring(String.java:1904)
> at 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:269)
> at 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob.run(StorageCleanupJob.java:91)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob.main(StorageCleanupJob.java:308)
> 2016-06-27 22:28:08,486 INFO  [Thread-0 
> HConnectionManager$HConnectionImplementation:1907]: Closing zookeeper 
> sessionid=0x154c97461586119
> 2016-06-27 22:28:08,491 INFO  [Thread-0 ZooKeeper:684]: Session: 
> 0x154c97461586119 closed
> 2016-06-27 22:28:08,491 INFO  [main-EventThread ClientCnxn:509]: EventThread 
> shut down



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1828) java.lang.StringIndexOutOfBoundsException in org.apache.kylin.storage.hbase.util.StorageCleanupJob

2016-06-27 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352250#comment-15352250
 ] 

Richard Calaba commented on KYLIN-1828:
---

Further Analysis of the problem:

StorageCleanupJob.java - line 266 - 270:
266while ((line = reader.readLine()) != null) {
267if (line.startsWith("kylin_intermediate_")) {
268boolean isNeedDel = false;
269String uuid = line.substring(line.length() - uuidLength, 
line.length());
270uuid = uuid.replace("_", "-");

Obviously the " String uuid = line.substring(line.length() - uuidLength, 
line.length());" on line 269 is causing the String out of bounds exception -> 
not sure why - do not have enough info about the assumed pattern for the table 
names -> but in my hive DB (default schema) I see kylin table names containing 
cube (or model) name and not uuid (kylin_intermediate_ - maybe 
this is the cause of the problem - not sure. 

Exception rasied there at line 269 is further causing bail out from the method 
cleanUnusedIntermediateHiveTable through the call stack - causing the 
additional confusing message in the log:

usage: StorageCleanupJob
-delete  Delete the unused storage

which indicates that the class was not called with correct parameters - BUT it 
was - according to 
https://kylin.apache.org/docs/howto/howto_cleanup_storage.html.


> java.lang.StringIndexOutOfBoundsException in 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob
> --
>
> Key: KYLIN-1828
> URL: https://issues.apache.org/jira/browse/KYLIN-1828
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2.1
>Reporter: Richard Calaba
>
> While running storage cleanup job:
> ./bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete 
> true
> I see Hive tables in form 
> kylin_intermediate__1970010100_20160701031500
>  in the defaul schema.
> While running the above storage cleaner (v.1.5.2.1 - all previously built 
> Cubes Disabled & Dropped) I am getting an error:
> 2016-06-27 22:28:08,480 INFO  [main StorageCleanupJob:262]: Remove 
> intermediate hive table with job id fc44da88-cffc-4710-8726-ff910cf83451 with 
> job status ERROR
> usage: StorageCleanupJob
>  -deleteDelete the unused storage
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String 
> index out of range: -2
> at java.lang.String.substring(String.java:1904)
> at 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:269)
> at 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob.run(StorageCleanupJob.java:91)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at 
> org.apache.kylin.storage.hbase.util.StorageCleanupJob.main(StorageCleanupJob.java:308)
> 2016-06-27 22:28:08,486 INFO  [Thread-0 
> HConnectionManager$HConnectionImplementation:1907]: Closing zookeeper 
> sessionid=0x154c97461586119
> 2016-06-27 22:28:08,491 INFO  [Thread-0 ZooKeeper:684]: Session: 
> 0x154c97461586119 closed
> 2016-06-27 22:28:08,491 INFO  [main-EventThread ClientCnxn:509]: EventThread 
> shut down



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1828) java.lang.StringIndexOutOfBoundsException in org.apache.kylin.storage.hbase.util.StorageCleanupJob

2016-06-27 Thread Richard Calaba (JIRA)
Richard Calaba created KYLIN-1828:
-

 Summary: java.lang.StringIndexOutOfBoundsException in 
org.apache.kylin.storage.hbase.util.StorageCleanupJob
 Key: KYLIN-1828
 URL: https://issues.apache.org/jira/browse/KYLIN-1828
 Project: Kylin
  Issue Type: Bug
Affects Versions: v1.5.2.1
Reporter: Richard Calaba


While running storage cleanup job:

./bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete 
true

I see Hive tables in form 
kylin_intermediate__1970010100_20160701031500
 in the defaul schema.

While running the above storage cleaner (v.1.5.2.1 - all previously built Cubes 
Disabled & Dropped) I am getting an error:

2016-06-27 22:28:08,480 INFO  [main StorageCleanupJob:262]: Remove intermediate 
hive table with job id fc44da88-cffc-4710-8726-ff910cf83451 with job status 
ERROR
usage: StorageCleanupJob
 -deleteDelete the unused storage
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String 
index out of range: -2
at java.lang.String.substring(String.java:1904)
at 
org.apache.kylin.storage.hbase.util.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:269)
at 
org.apache.kylin.storage.hbase.util.StorageCleanupJob.run(StorageCleanupJob.java:91)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at 
org.apache.kylin.storage.hbase.util.StorageCleanupJob.main(StorageCleanupJob.java:308)
2016-06-27 22:28:08,486 INFO  [Thread-0 
HConnectionManager$HConnectionImplementation:1907]: Closing zookeeper 
sessionid=0x154c97461586119
2016-06-27 22:28:08,491 INFO  [Thread-0 ZooKeeper:684]: Session: 
0x154c97461586119 closed
2016-06-27 22:28:08,491 INFO  [main-EventThread ClientCnxn:509]: EventThread 
shut down








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)

2016-06-21 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba closed KYLIN-1810.
-
Resolution: Fixed

> NPE in 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
> 
>
> Key: KYLIN-1810
> URL: https://issues.apache.org/jira/browse/KYLIN-1810
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2.1
>Reporter: Richard Calaba
> Attachments: job_2016_06_21_16_23_51-err.zip
>
>
> Hello,
> running into weird issue. I have designed Kylin cube. Clonned it to another 
> cube without any changes and run the Build job. The Build succeeded. Then I 
> have discarder the build job and disabled and dropped the cube. Clonned the 
> same cube again (into different name than  previously) and then again started 
> to build the cube. Getting an NPE below every time in Step 4  - Build 
> Dimension Dictionary":
> java.lang.NullPointerException
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> Attaching the Diagnostic logs.
> Any clue how to resolve this ??? 
> I am thinking to wipe all Kylin metadata from repository and try to restore 
> from backup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)

2016-06-21 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343640#comment-15343640
 ] 

Richard Calaba edited comment on KYLIN-1810 at 6/22/16 3:20 AM:


Ok,

did:

1) git clone -b kylin-1.5.2.1
2) cd kylin
3) wget 
https://issues.apache.org/jira/secure/attachment/12804584/initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch
3) git apply 
initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch
4) rebuild kylin distro and started patched kylin
5) Resumed the failed Cube jobs -> both finished OK

Confirming the resolution & closing the ticket.
Thank you !


was (Author: cal...@gmail.com):
Ok,

did:

1) git clone -b kylin-1.5.2.1
2) cd kylin
3) wget 
https://issues.apache.org/jira/secure/attachment/12804584/initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch
3) git apply 
initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch
4) rebuild kylin distro and started patched kylin
5) Resumed the failed Cube jobs -> bith finished OK

Confirming the resolution & closing the ticket.
Thank you !

> NPE in 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
> 
>
> Key: KYLIN-1810
> URL: https://issues.apache.org/jira/browse/KYLIN-1810
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2.1
>Reporter: Richard Calaba
> Attachments: job_2016_06_21_16_23_51-err.zip
>
>
> Hello,
> running into weird issue. I have designed Kylin cube. Clonned it to another 
> cube without any changes and run the Build job. The Build succeeded. Then I 
> have discarder the build job and disabled and dropped the cube. Clonned the 
> same cube again (into different name than  previously) and then again started 
> to build the cube. Getting an NPE below every time in Step 4  - Build 
> Dimension Dictionary":
> java.lang.NullPointerException
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> Attaching the Diagnostic logs.
> Any clue how to resolve this ??? 
> I am thinking to wipe all Kylin metadata from repository and try to restore 
> from backup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)

2016-06-21 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343208#comment-15343208
 ] 

Richard Calaba edited comment on KYLIN-1810 at 6/22/16 2:25 AM:


And the bad news is that I have created new cube based on new model - not 
sharing the fact table Hive View but sharing the Dimensions (Lookup Tables) 
Hive views -> the new cube also fails with same exception. Seems 'reuse' of the 
table snapshot across cube builds and across different cubes doesn't work ...


was (Author: cal...@gmail.com):
ANd the bad news is that I have created new cube based on new model - not 
hsaring the fact table Hive View but sharing the Dimensions (Lookup Tables) 
Hive views -> the new cube also fails with same exception. Seems 'reuse' of the 
table snapshot across cube builds and across different cubes doesn't work ...

> NPE in 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
> 
>
> Key: KYLIN-1810
> URL: https://issues.apache.org/jira/browse/KYLIN-1810
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2.1
>Reporter: Richard Calaba
> Attachments: job_2016_06_21_16_23_51-err.zip
>
>
> Hello,
> running into weird issue. I have designed Kylin cube. Clonned it to another 
> cube without any changes and run the Build job. The Build succeeded. Then I 
> have discarder the build job and disabled and dropped the cube. Clonned the 
> same cube again (into different name than  previously) and then again started 
> to build the cube. Getting an NPE below every time in Step 4  - Build 
> Dimension Dictionary":
> java.lang.NullPointerException
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> Attaching the Diagnostic logs.
> Any clue how to resolve this ??? 
> I am thinking to wipe all Kylin metadata from repository and try to restore 
> from backup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)

2016-06-21 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343208#comment-15343208
 ] 

Richard Calaba commented on KYLIN-1810:
---

ANd the bad news is that I have created new cube based on new model - not 
hsaring the fact table Hive View but sharing the Dimensions (Lookup Tables) 
Hive views -> the new cube also fails with same exception. Seems 'reuse' of the 
table snapshot across cube builds and across different cubes doesn't work ...

> NPE in 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
> 
>
> Key: KYLIN-1810
> URL: https://issues.apache.org/jira/browse/KYLIN-1810
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2.1
>Reporter: Richard Calaba
> Attachments: job_2016_06_21_16_23_51-err.zip
>
>
> Hello,
> running into weird issue. I have designed Kylin cube. Clonned it to another 
> cube without any changes and run the Build job. The Build succeeded. Then I 
> have discarder the build job and disabled and dropped the cube. Clonned the 
> same cube again (into different name than  previously) and then again started 
> to build the cube. Getting an NPE below every time in Step 4  - Build 
> Dimension Dictionary":
> java.lang.NullPointerException
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> Attaching the Diagnostic logs.
> Any clue how to resolve this ??? 
> I am thinking to wipe all Kylin metadata from repository and try to restore 
> from backup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)

2016-06-21 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1810:
--
Description: 
Hello,

running into weird issue. I have designed Kylin cube. Clonned it to another 
cube without any changes and run the Build job. The Build succeeded. Then I 
have discarder the build job and disabled and dropped the cube. Clonned the 
same cube again (into different name than  previously) and then again started 
to build the cube. Getting an NPE below every time in Step 4  - Build Dimension 
Dictionary":

java.lang.NullPointerException
at 
org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
at 
org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167)
at 
org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128)
at 
org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108)
at 
org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
at 
org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at 
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

result code:2

Attaching the Diagnostic logs.

Any clue how to resolve this ??? 

I am thinking to wipe all Kylin metadata from repository and try to restore 
from backup.



  was:
Hello,

running into weird issue. I have designed Kylin cube. Clonned it to another 
cube without any changes and run the Build job. The Build succeeded. Then I 
have discarder the build job and disabled and dropped the cube. Clonned the 
same cube again (into different name than  previously) and then again started 
to build the cube. Getting an NPE below every time in Step 4  - Build Dimension 
Dictionary":

 Kylin
jambajuice_3_0

Insight
Model
Monitor
System
 Help 
Welcome, ADMIN 
 
Jobs
Slow Queries
Cube Name: 
Filter ...
Jobs in:NEW PENDING RUNNING FINISHED ERROR DISCARDED   
Job NameCubeProgressLast Modified Time  Duration
Actions 
JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_20160626042000 - BUILD 
- PDT 2016-06-21 16:29:03JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone  
ERROR
2016-06-21 15:35:21 PST 5.80 mins   Action  
JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_20160626042000 - BUILD 
- PDT 2016-06-21 16:06:48JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone  
100%
2016-06-21 15:14:24 PST 6.85 mins   Action  
JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_2016062600 - BUILD 
- PDT 2016-06-21 14:10:10JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone  
14.29%
2016-06-21 14:59:37 PST 5.70 mins   Action  
Total: 3
 Detail Information
Job NameJAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 
1970010100_20160626042000 - BUILD - PDT 2016-06-21 16:29:03
Job ID  ac090c87-496d-4173-9503-6a9ec97a764e
Status  ERROR
Duration5.80 mins
MapReduce Waiting   0.18 mins
Start   2016-06-21 15:29:32 PST
 2016-06-21 15:29:32 PST
#1 Step Name: Create Intermediate Flat Hive Table
Duration: 2.82 mins
  
 2016-06-21 15:32:21 PST
#2 Step Name: Materialize Hive View in Lookup Tables
Duration: 2.11 mins
  
 2016-06-21 15:34:28 PST
#3 Step Name: Extract Fact Table Distinct Columns
Duration: 0.86 mins
  
 2016-06-21 15:35:20 PST
#4 Step Name: Build Dimension Dictionary
Duration: 0.02 mins
 
#5 Step Name: Save Cuboid Statistics
Duration: 0 seconds
#6 Step Name: Create HTable
Duration: 0 seconds
#7 Step Name: Build Base Cuboid Data
Duration: 0 seconds
#8 Step Name: Build N-Dimension Cuboid Data : 8-Dimension
Duration: 0 seconds
#9 Step Name: Build N-Dimension Cuboid Data : 7-Dimension
Duration: 0 seconds
#10 Step Name: Build N-Dimension Cuboid Data : 6-Dimension
Duration: 0 seconds
#11 Step Name: Build N-Dimension Cuboid Data : 5-Dimension
Duration: 0 seconds
#12 

[jira] [Updated] (KYLIN-1810) NPE in org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)

2016-06-21 Thread Richard Calaba (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1810:
--
Attachment: job_2016_06_21_16_23_51-err.zip

> NPE in 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
> 
>
> Key: KYLIN-1810
> URL: https://issues.apache.org/jira/browse/KYLIN-1810
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2.1
>Reporter: Richard Calaba
> Attachments: job_2016_06_21_16_23_51-err.zip
>
>
> Hello,
> running into weird issue. I have designed Kylin cube. Clonned it to another 
> cube without any changes and run the Build job. The Build succeeded. Then I 
> have discarder the build job and disabled and dropped the cube. Clonned the 
> same cube again (into different name than  previously) and then again started 
> to build the cube. Getting an NPE below every time in Step 4  - Build 
> Dimension Dictionary":
>  Kylin
> jambajuice_3_0
> Insight
> Model
> Monitor
> System
>  Help 
> Welcome, ADMIN 
>  
> Jobs
> Slow Queries
> Cube Name: 
> Filter ...
> Jobs in:NEW PENDING RUNNING FINISHED ERROR DISCARDED   
> Job Name  CubeProgressLast Modified Time  Duration
> Actions 
> JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_20160626042000 - 
> BUILD - PDT 2016-06-21 16:29:03  JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone  
> ERROR
> 2016-06-21 15:35:21 PST   5.80 mins   Action  
> JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_20160626042000 - 
> BUILD - PDT 2016-06-21 16:06:48  JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone  
> 100%
> 2016-06-21 15:14:24 PST   6.85 mins   Action  
> JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 1970010100_2016062600 - 
> BUILD - PDT 2016-06-21 14:10:10  JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone  
> 14.29%
> 2016-06-21 14:59:37 PST   5.70 mins   Action  
> Total: 3
>  Detail Information
> Job Name  JAMBAJUICE_3_0_REL_TRX_POS_CHECK_clone - 
> 1970010100_20160626042000 - BUILD - PDT 2016-06-21 16:29:03
> Job IDac090c87-496d-4173-9503-6a9ec97a764e
> StatusERROR
> Duration  5.80 mins
> MapReduce Waiting 0.18 mins
> Start   2016-06-21 15:29:32 PST
>  2016-06-21 15:29:32 PST
> #1 Step Name: Create Intermediate Flat Hive Table
> Duration: 2.82 mins
>   
>  2016-06-21 15:32:21 PST
> #2 Step Name: Materialize Hive View in Lookup Tables
> Duration: 2.11 mins
>   
>  2016-06-21 15:34:28 PST
> #3 Step Name: Extract Fact Table Distinct Columns
> Duration: 0.86 mins
>   
>  2016-06-21 15:35:20 PST
> #4 Step Name: Build Dimension Dictionary
> Duration: 0.02 mins
>  
> #5 Step Name: Save Cuboid Statistics
> Duration: 0 seconds
> #6 Step Name: Create HTable
> Duration: 0 seconds
> #7 Step Name: Build Base Cuboid Data
> Duration: 0 seconds
> #8 Step Name: Build N-Dimension Cuboid Data : 8-Dimension
> Duration: 0 seconds
> #9 Step Name: Build N-Dimension Cuboid Data : 7-Dimension
> Duration: 0 seconds
> #10 Step Name: Build N-Dimension Cuboid Data : 6-Dimension
> Duration: 0 seconds
> #11 Step Name: Build N-Dimension Cuboid Data : 5-Dimension
> Duration: 0 seconds
> #12 Step Name: Build N-Dimension Cuboid Data : 4-Dimension
> Duration: 0 seconds
> #13 Step Name: Build N-Dimension Cuboid Data : 3-Dimension
> Duration: 0 seconds
> #14 Step Name: Build N-Dimension Cuboid Data : 2-Dimension
> Duration: 0 seconds
> #15 Step Name: Build N-Dimension Cuboid Data : 1-Dimension
> Duration: 0 seconds
> #16 Step Name: Build N-Dimension Cuboid Data : 0-Dimension
> Duration: 0 seconds
> #17 Step Name: Build Cube
> Duration: 0 seconds
> #18 Step Name: Convert Cuboid Data to HFile
> Duration: 0 seconds
> #19 Step Name: Load HFile to HBase Table
> Duration: 0 seconds
> #20 Step Name: Update Cube Info
> Duration: 0 seconds
> #21 Step Name: Garbage Collection
> Duration: 0 seconds
> End   
>  Apache Kylin |  Apache Kylin Community
> Output 
> java.lang.NullPointerException
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:164)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:167)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:128)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:108)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at 

[jira] [Commented] (KYLIN-1704) When load empty snapshot, NULL Pointer Exception occurs

2016-06-16 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334202#comment-15334202
 ] 

Richard Calaba commented on KYLIN-1704:
---

Sorry - cannot reproduce anymore - before I was getting the above mentioned NPE 
in Build Cube Step 3. Now in the same Build Step 3 I am getting another 
exception:

java.lang.IllegalStateException: Dup key found, key=[null], 
value1=[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,20160324],
 
value2=[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,20160324]
at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:83)
at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:66)
at org.apache.kylin.dict.lookup.LookupTable.(LookupTable.java:54)
at 
org.apache.kylin.dict.lookup.LookupStringTable.(LookupStringTable.java:33)

Could be that data has changed meanwhile ... but not 100% sure ... so do not 
want to confuse you ... 

Is this Duplicate Lookup table entry checks something new in Kylin 1.5.2.1 ??? 
Askin as the previous Cube I clonned from was built successfully but on earlier 
Kylin version ... 

> When load empty snapshot, NULL Pointer Exception occurs
> ---
>
> Key: KYLIN-1704
> URL: https://issues.apache.org/jira/browse/KYLIN-1704
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: v1.5.0, v1.5.1, v1.5.2
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
> Fix For: v1.5.3
>
> Attachments: 
> initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch
>
>
> Error Log: java.lang.NullPointerException
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:163)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:164)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:125)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:105)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:205)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1704) When load empty snapshot, NULL Pointer Exception occurs

2016-06-16 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333448#comment-15333448
 ] 

Richard Calaba commented on KYLIN-1704:
---

Ok - will try to reproduce and attach logs, tomorrow - is 2:30am here in 
California :)

> When load empty snapshot, NULL Pointer Exception occurs
> ---
>
> Key: KYLIN-1704
> URL: https://issues.apache.org/jira/browse/KYLIN-1704
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
> Attachments: 
> initialize_rowIndices_and_dict_during_deserializing_when_snapshot_table_is_empty.patch
>
>
> Error Log: java.lang.NullPointerException
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.equals(SnapshotTable.java:163)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManager.java:164)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotManager.java:125)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:105)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:205)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1789) Couldn't use View as Lookup when join type is "inner"

2016-06-15 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332920#comment-15332920
 ] 

Richard Calaba commented on KYLIN-1789:
---

Additional info - also obtained from Bhanu Mohanty - the reporter of the bug:

While looking into logs he saw a message:

Dup key found, key=[-289271615434074838,-7076210457049756771], 
value1=[2016-03-24,-289271615434074838,-7076210457049756771,Medium,6,4,,null,null,null,null,null],
 
value2=[2016-03-24,-289271615434074838,-7076210457049756771,Medium,6,4,,null,null,null,null,null]
at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:83)

That would indicate that the Inner Join expected in the lookup table to find 
only one record ... if this is true and the Inner Join on Lookup req. one 
record that it is clear inconsistency (1st the model is defined as FACT INNER 
JOIN LOOKUP ON key1, key2, ... so this has to allow multiple candidates in 
lookup (even this is not typical) ; 2nd there is no reason why Left Outer join 
would accept duplicate key entries in the Right operand and Inner join won't 
allow it)


> Couldn't use View as Lookup when join type is "inner"
> -
>
> Key: KYLIN-1789
> URL: https://issues.apache.org/jira/browse/KYLIN-1789
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Shaofeng SHI
>Assignee: Shaofeng SHI
> Fix For: v1.5.3
>
>
> Reported by Bhanu Mohanty in user mailing list:
> I am using kylin-1.5.2.1
> Added hive view as a look up table 
> Getting  error at Build Dimension Dictionary
> DEFAULT.kylin_intermediate_DEFAULT_*
> If the join is "inner" 
> It works when I changed the join to "left" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join

2016-06-15 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330956#comment-15330956
 ] 

Richard Calaba edited comment on KYLIN-1576 at 6/15/16 9:03 PM:


Interestingly - I just found out that Apache Drill supports this scenario - at 
least 2 yars ago they pacthed it to support this if at least one equi join 
condition is used: https://issues.apache.org/jira/browse/DRILL-485 

So questions:

 1) Why Hive cannot implement same - seems the argument in the docu doesn't 
hold anymore ...
 2) Should / Can Kyling Cube Build use Apache Drill while builiding the data 
cubes ??? 

Update in regards to the 2) - Can Kylin Cube Build use Apache Drill ... seems 
this is on eBay roadmap ... considering Slide 68 here: 
http://events.mapr.com/BayAreaApacheDrill and here 
https://www.slideshare.net/secret/lMvMrzy9mFeyBP/?utm_source=ion_medium=Meetups_campaign=ION_MKT_HUG_ApacheDrill_GA.
 Morover seems they might implement the idea of running Drill Query in case 
Kylin Query cannot be evaluated ... that's awesome ... and the same idea we are 
trying to implement at Fishbowl
  


was (Author: cal...@gmail.com):
Interestingly - I just found out that Apache Drill supports this scenario - at 
least 2 yars ago they pacthed it to support this if at least one equi join 
condition is used: https://issues.apache.org/jira/browse/DRILL-485 

So questions:

 1) Why Hive cannot implement same - seems the argument in the docu doesn't 
hold anymore ...
 2) Should / Can Kyling Cube Build use Apache Drill while builiding the data 
cubes ??? 

> Support of new join type in the Cube Model - Temporal Join
> --
>
> Key: KYLIN-1576
> URL: https://issues.apache.org/jira/browse/KYLIN-1576
> Project: Kylin
>  Issue Type: New Feature
>  Components: General
>Affects Versions: Future
>Reporter: Richard Calaba
>Priority: Blocker
>
> There is a notion of time-dependent master data in many business scenarios. 
> Typically modeled with granularity 1 day (datefrom, dateto fields of type 
> DATE defining validity time of one master data record). Occasionally you can 
> think of lower granularity so use of TIMESTAMP can be also seen as an valid 
> scenario). Example of such master data definition could be:
> Master Data / Dimension Table:
> =
> KEY: PRODUCT_ID, DATE_TO, 
> NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION
> - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is 
> assumed that DATE_TO <= DATE_TO and also that there are no overlapping 
> intervals (DATE_FROM, DATE_TO) for all PRODUCT master data
> - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the 
> statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= 
> today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' 
> PRODUCT master data (description). The today/now is also being named as 'key 
> date'.
> - now if I have transaction data (FACT table) of product sales, i.e:
> SALES_DATE, PRODUCT_ID, STORE_ID, 
> I would like to show the Sold Products at Store at certain date and also show 
> the Description of the product at the date of product sale (assuming here 
> that there is product catalog which can be updated independently, but for 
> auditing purposes the original product description used during sale is needed 
> to be displayed/used).
> The SQL for the temporal join would be then:
> SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION 
> FROM SALES as S LEFT OUTER JOIN PRODUCT as P 
> ON S.PRODUCT_ID = P.PRODUCT_ID 
> AND S.SALES_DATE >= P.DATE_FROM AND 
> AND S.SALES_DATE <= P.DATE_TO 
> (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but 
> in this case it won't be the proper join - we need to show the product sales 
> even the description wasn't maintained in product master data)
> (some more details for temporal joins - see i.e. here - 
> http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval
>  )
> This scenario can be supported by Kylin if following enhancement would be 
> done:
> 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER 
> joins (suggesting name temporal (left outer/inner) join) which forces to 
> specify a 'key date' as a expression (column / constant / ...) from the FACT 
> table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP 
> table/ Those 2 validity fields are defining master data record validity 
> period. Supported types for those fields should be DATE, optionally TIMESTAMP 
> is also fine but rarely used in business scenarios. 
> Other option rather then defining new join type is to loosen the join 
> condition and allowing <= and >= operands to be 

[jira] [Commented] (KYLIN-1786) Frontend work for KYLIN-1313

2016-06-15 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332470#comment-15332470
 ] 

Richard Calaba commented on KYLIN-1786:
---

Hello - is it possible to request change from Minor to Critical on this issue 
???

And as [~mahongbin] pointed in https://issues.apache.org/jira/browse/KYLIN-1313 
please include on UI option to be able to Derive Dimension also from Fact table 
!!! That should be supported by backend already.

The reason for Critical is:

 1) It relates to the https://issues.apache.org/jira/browse/KYLIN-1313 - which 
is Major priority
 
 2) The lack of this function on UI (Derived dimensions for Fact table) is 
major Draw-back in the Jira Issue  - 
https://issues.apache.org/jira/browse/KYLIN-1576 - which is trying to resolve 
functionality gap for so called Time-Dependent attributes. It is possible use 
some view based workarounds to bring time-dependent attributes as Fact table 
attributes (calculated by lookup to the dimension table with respect to the 
date column in fact table) but the lack of being able to define those time-dep. 
attributes as Derived Dimension causes significant inefficiency while buliding 
the cube.


> Frontend work for KYLIN-1313
> 
>
> Key: KYLIN-1786
> URL: https://issues.apache.org/jira/browse/KYLIN-1786
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Reporter: Dong Li
>Assignee: Zhong,Jason
>Priority: Minor
> Attachments: 屏幕快照 2016-06-15 12.22.54.png
>
>
> KYLIN-1313 introduced a measure called extendedcolumn, but seems not enabled 
> on WebUI, see attached screenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join

2016-06-14 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330891#comment-15330891
 ] 

Richard Calaba edited comment on KYLIN-1576 at 6/15/16 1:04 AM:


One workaround - not generic - but solves the case with the temporal join logic 
which is composed from:
  - equality join on entity ID (id1, ... id-N)
  - 2 non-equality joins on entity validity (date_from, date_to) 

is to define new fact-table which includes the original fact table and 
time-dependent atributes.

So the old join condition in the model:

  - FROM fact_table
LEFT OUTER JOIN time_dependent_attrs 
ON fact_table.id1 = time_dependent_attrs.id1 (AND  
fact_table.id-N = time_dependent_attrs.id-N)*
AND fact_table.transaction_date <= time_dependent_attrs.date_to 
AND fact_table.transaction_date >= 
time_dependent_attrs.date_from

 you can define new fact table this way to achieve same logic:

create table/view fact_table_new AS 
SELECT fact_table.*, 
timedep.attr1,  timedep.attr2,  
FROM fact_table AS fact
 LEFT OUTER JOIN time_dependent_attrs AS timedep
 ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* 
  
WHERE fact.transaction_date BETWEEN timedep.date_from AND 
timedep.date_to;

The draw-back of this solution: You will make all time-dependent attributes (if 
needed to be used for grouping) as separate Normal dimensions - Kylin cannot 
utilize the optimized logic for Derived dimensions. So this solution is 
practical only for small amount of time-dependent attributes. If this JIRA 
ticket (shown as resilved) - https://issues.apache.org/jira/browse/KYLIN-1313 - 
would be true ... then you should be able to define those Time-Dep. attributes 
as part of Derrived Dimension ... but unfortunately I didn't find out how to do 
it on UI -  in Kylin 1.5.2

The 2nd workaround is (instead of creating new fact table) to create new 
dimension table (lookup table) where you can map the records from t to the keys 
of the original fact-table:

create table/view new_dim_table AS 
SELECT fact_table.id-1, (fact_table.id-N)*,  
timedep.*
FROM fact_table AS fact
 LEFT OUTER JOIN time_dependent_attrs AS timedep
 ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)*   
WHERE fact.transaction_date BETWEEN timedep.date_from AND 
timedep.date_to;

And then you can use Kylin Model to define:
 
 fact_table INNER JOIN new_dim_table(you do not have to use LEFT OUTER 
JOIN here anymore)
ON fact_table.id1 = new_dim_table.id1 (AND fact_table.id-N = 
new_dim_table.id-N)* 

This way you will get Dim table of same size as Fact tabe -> but you can still 
utilize Derived dimensions benefits in Kylin.
 


was (Author: cal...@gmail.com):
One workaround - not generic - but solves the case with the temporal join logic 
which is composed from:
  - equality join on entity ID (id1, ... id-N)
  - 2 non-equality joins on entity validity (date_from, date_to) 

is to define new fact-table which includes the original fact table and 
time-dependent atributes.

So the old join condition in the model:

  - FROM fact_table
LEFT OUTER JOIN time_dependent_attrs 
ON fact_table.id1 = time_dependent_attrs.id1 (AND  
fact_table.id-N = time_dependent_attrs.id-N)*
AND fact_table.transaction_date <= time_dependent_attrs.date_to 
AND fact_table.transaction_date >= 
time_dependent_attrs.date_from

 you can define new fact table this way to achieve same logic:

create table/view fact_table_new AS 
SELECT fact_table.*, 
timedep.attr1,  timedep.attr2,  
FROM fact_table AS fact
 LEFT OUTER JOIN time_dependent_attrs AS timedep
 ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* 
  
WHERE fact.transaction_date BETWEEN timedep.date_from AND 
timedep.date_to;

The draw-back of this solution: You will make all time-dependent attributes (if 
needed to be used for grouping) as separate Normal dimensions - Kylin cannot 
utilize the optimized logic for Derived dimensions. So this solution is 
practical only for small amount of time-dependent attributes.

The 2nd workaround is (instead of creating new fact table) to create new 
dimension table (lookup table) where you can map the records from t to the keys 
of the original fact-table:

create table/view new_dim_table AS 
SELECT fact_table.id-1, (fact_table.id-N)*,  
timedep.*
FROM fact_table AS fact
 LEFT OUTER JOIN time_dependent_attrs AS timedep
 ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)*   
WHERE fact.transaction_date BETWEEN timedep.date_from AND 
timedep.date_to;

And then you can 

[jira] [Commented] (KYLIN-1313) Enable deriving dimensions on non PK/FK

2016-06-14 Thread Richard Calaba (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330963#comment-15330963
 ] 

Richard Calaba commented on KYLIN-1313:
---

Similiar question - today playing 2ith Kylin-1.5.2 - I didn't see anywhere the 
ability (on UI) to specify that Dimension can be Derrived from Fact table ... 
where is it ???

> Enable deriving dimensions on non PK/FK
> ---
>
> Key: KYLIN-1313
> URL: https://issues.apache.org/jira/browse/KYLIN-1313
> Project: Kylin
>  Issue Type: Improvement
>Reporter: hongbin ma
>Assignee: hongbin ma
> Fix For: v1.5.2
>
>
> currently derived column has to be columns on look table, and the derived 
> host column has to be PK/FK(It's also a problem when the lookup table grows 
> every large). Sometimes columns on the fact exhibit deriving relationship 
> too. Here's an example fact table:
> (dt date, seller_id bigint, seller_name varchar(100) , item_id bigint, 
> item_url varchar(1000), count decimal, price decimal)
> seller_name is uniquely determined by each seller id, and item_url is 
> uniquely determined by each item_id. The users does not expect to do 
> filtering on columns like seller name or item_url, they just want to retrieve 
> it when they do grouping/filtering on other dimensions like selller id, item 
> id or even other dimensions like dt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >