[jira] [Commented] (KYLIN-1965) Check duplicated measure name

2016-09-05 Thread Zhong,Jason (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466317#comment-15466317
 ] 

Zhong,Jason commented on KYLIN-1965:


looks good.I have merge your latest code to master.

> Check duplicated measure name
> -
>
> Key: KYLIN-1965
> URL: https://issues.apache.org/jira/browse/KYLIN-1965
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.5.2, v1.5.3
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v1.5.4
>
> Attachments: KYLIN-1965-bug-fix.patch, KYLIN-1965.patch
>
>
> The duplicated measure's name will lead to query failed, so we should check 
> duplicated measure name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1965) Check duplicated measure name

2016-09-05 Thread Zhixiong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhixiong Chen updated KYLIN-1965:
-
Attachment: (was: 0001-KYLIN-1965-UPDATE.patch)

> Check duplicated measure name
> -
>
> Key: KYLIN-1965
> URL: https://issues.apache.org/jira/browse/KYLIN-1965
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.5.2, v1.5.3
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v1.5.4
>
> Attachments: KYLIN-1965-bug-fix.patch, KYLIN-1965.patch
>
>
> The duplicated measure's name will lead to query failed, so we should check 
> duplicated measure name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1996) Keep original column order when designing cube

2016-09-05 Thread Billy(Yiming) Liu (JIRA)
Billy(Yiming) Liu created KYLIN-1996:


 Summary: Keep original column order when designing cube 
 Key: KYLIN-1996
 URL: https://issues.apache.org/jira/browse/KYLIN-1996
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Affects Versions: v1.5.2
Reporter: Billy(Yiming) Liu
Assignee: Zhong,Jason
Priority: Minor


[Quote from Luke's mail]
I think the designer should keep original order when create "data model"
And allow user to re-order in a data model.

When create a cube, it should also keep the same order from data model.

User could adjust the order after created dimensions and metrics later




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-09-05 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464661#comment-15464661
 ] 

liyang edited comment on KYLIN-1834 at 9/5/16 10:07 AM:


Before the fix, dictionary broke down around 1GB I guess.
After the fix, 2GB is the upper limit of dictionary size. And a clear error 
will be thrown if the size goes beyond.


was (Author: liyang.g...@gmail.com):
Before the fix, dictionary broke down around 1GB I guess.
After the fix, 2GB is the upper limit of dictionary size. And a clear error 
will be thrown the size goes beyond.

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Assignee: liyang
>Priority: Blocker
> Fix For: v1.5.4
>
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were getting exception 

[jira] [Commented] (KYLIN-1834) java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build Dimension Dictionary

2016-09-05 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464661#comment-15464661
 ] 

liyang commented on KYLIN-1834:
---

Before the fix, dictionary broke down around 1GB I guess.
After the fix, 2GB is the upper limit of dictionary size. And a clear error 
will be thrown the size goes beyond.

> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> --
>
> Key: KYLIN-1834
> URL: https://issues.apache.org/jira/browse/KYLIN-1834
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2, v1.5.2.1
>Reporter: Richard Calaba
>Assignee: liyang
>Priority: Blocker
> Fix For: v1.5.4
>
> Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>   at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>   at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>   at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>   at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>  * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>  * 
>  * - if roundingFlag=0, throw IllegalArgumentException; 
>  * - if roundingFlag<0, the closest smaller ID integer if exist; 
>  * - if roundingFlag>0, the closest bigger ID integer if exist. 
>  * 
>  * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>  * 
>  * @throws IllegalArgumentException
>  * if value is not found in dictionary and rounding is off;
>  * or if rounding cannot find a smaller or bigger ID
>  */
> final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
> if (isNullByteForm(value, offset, len))
> return nullId();
> else {
> int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
> if (id < 0)
> throw new IllegalArgumentException("Value not exists!");
> return id;
> }
> } 
> ==
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected 
> dimension/row key Encoding from "dict" to "int; length=8" on the Advanced 
> Settings of 

[jira] [Commented] (KYLIN-1973) java.lang.NegativeArraySizeException when Build Dimension Dictionary

2016-09-05 Thread liyang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464641#comment-15464641
 ] 

liyang commented on KYLIN-1973:
---

The root cause is the dictionary bigger than 2GB and overflows the int range, 
as mentioned in KYLIN-1834

> java.lang.NegativeArraySizeException when Build Dimension Dictionary
> 
>
> Key: KYLIN-1973
> URL: https://issues.apache.org/jira/browse/KYLIN-1973
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.3
>Reporter: zhengdong
>Assignee: liyang
> Fix For: v1.5.4
>
>
>  exception when Build Dimension Dictionary:
> java.lang.NegativeArraySizeException
>   at 
> org.apache.kylin.dict.TrieDictionary.getValueFromIdImpl(TrieDictionary.java:274)
>   at 
> org.apache.kylin.common.util.Dictionary.getValueFromId(Dictionary.java:130)
>   at 
> org.apache.kylin.dict.lookup.SnapshotTable$1.getRow(SnapshotTable.java:138)
>   at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:67)
>   at 
> org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:79)
>   at org.apache.kylin.dict.lookup.LookupTable.(LookupTable.java:55)
>   at 
> org.apache.kylin.dict.lookup.LookupStringTable.(LookupStringTable.java:65)
>   at 
> org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:619)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:61)
>   at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>   at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1989) Sum issue: result not consistent with hive

2016-09-05 Thread Le Van Ha (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464331#comment-15464331
 ] 

Le Van Ha commented on KYLIN-1989:
--

Hi hongbin ma,

You say: " kylin does not support case when in aggregators(sum,min,max) yet. 
Only "case when" for dimension is supported ".
But with query:
SELECT channel.name,product.product_vendor, sum(case when 
product.product_vendor <> 'VietTravel' then fact_product.quantity else 0 end) 
FROM fact_product_sales as fact_product
join dim_product as product on fact_product.product_id = product.id
join dim_channel as channel on fact_product.channel_id = channel.id
GROUP BY channel.name, product.product_vendor

The result by kylin:

Online StoreEcork   10146
Buy Button  VietTravel 0
Online StoreAqua Dome 10028
Buy Button  Aqua Dome  10061
Buy Button  Hilton10044


if "case when" in aggregators(sum,min,max) contains the column, but this column 
in "select" or "GROUP BY" query. then returns the correct results.

> Sum issue: result not consistent with hive
> --
>
> Key: KYLIN-1989
> URL: https://issues.apache.org/jira/browse/KYLIN-1989
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.5.2
>Reporter: Le Van Ha
>Assignee: hongbin ma
> Attachments: hive_result.png, kylin_result.png
>
>
>  When do the following query, 
> SELECT channel.name, sum(case when product.product_vendor <> '' then 
> fact_product.quantity else 0 end) 
> FROM fact_product_sales as fact_product 
> join dim_product as product on fact_product.product_id = product.id 
> join dim_channel as channel on fact_product.channel_id = channel.id  
> GROUP BY channel.name
> ---
> The result by kylin:
> Buy Button0
> Online Store 0
> ---
> The result by hive is shown in figure.
> Why is that?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1965) Check duplicated measure name

2016-09-05 Thread kangkaisen (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464289#comment-15464289
 ] 

kangkaisen commented on KYLIN-1965:
---

Hi, [~zhongjian], please you review the KYLIN-1965-bug-fix.patch. Thanks.

> Check duplicated measure name
> -
>
> Key: KYLIN-1965
> URL: https://issues.apache.org/jira/browse/KYLIN-1965
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.5.2, v1.5.3
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v1.5.4
>
> Attachments: 0001-KYLIN-1965-UPDATE.patch, KYLIN-1965-bug-fix.patch, 
> KYLIN-1965.patch
>
>
> The duplicated measure's name will lead to query failed, so we should check 
> duplicated measure name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KYLIN-1965) Check duplicated measure name

2016-09-05 Thread kangkaisen (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kangkaisen updated KYLIN-1965:
--
Attachment: KYLIN-1965-bug-fix.patch

This patch fix the bug of KYLIN-1965

> Check duplicated measure name
> -
>
> Key: KYLIN-1965
> URL: https://issues.apache.org/jira/browse/KYLIN-1965
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metadata
>Affects Versions: v1.5.2, v1.5.3
>Reporter: kangkaisen
>Assignee: kangkaisen
> Fix For: v1.5.4
>
> Attachments: 0001-KYLIN-1965-UPDATE.patch, KYLIN-1965-bug-fix.patch, 
> KYLIN-1965.patch
>
>
> The duplicated measure's name will lead to query failed, so we should check 
> duplicated measure name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)