[jira] [Commented] (KYLIN-3140) Auto merge jobs should not block user build jobs

2018-02-02 Thread Wang, Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350485#comment-16350485
 ] 

Wang, Gang commented on KYLIN-3140:
---

Yes. The administrator do need to take care of each failed job. And my point is 
the failed auto merge job should not block the user incremental building job.

Currently, the problem is that if the max-building-segments number is set to 1, 
before the failed auto merge job is handled properly and resumed successfully, 
user can not submit a new building job.

And if we differentiate the auto merge job and user build/refresh job, we can 
also set a concurrent threshold for each of them, which may protect Kylin 
server from OOM or other performance issue.

> Auto merge jobs should not block user build jobs
> 
>
> Key: KYLIN-3140
> URL: https://issues.apache.org/jira/browse/KYLIN-3140
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Shaofeng SHI
>Priority: Major
>
> Although in the latest version, Kylin support concurrent jobs. If the 
> concurrency is set to 1, there is some possibility that cube build job will 
> have dead lock. Say, when there is some issue which causes merge job failed, 
> even when you discard the job, another job will be launched and failed again 
> due to auto merge policy. And this failed merge job blocks user to build 
> incremental segment.
> Even if the concurrency is set to larger than 1, the auto merge jobs occupy 
> some concurrency quota. 
> While, from user perspective, they don't care much about the auto merge jobs, 
> and the auto merge jobs should not block the building/refresh jobs they 
> submit manually.
> A better way may be separating the auto merge jobs from the job queue, 
> parameter max-building-segments only limit jobs submitted by users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (KYLIN-3141) Support offset for segment merge

2018-02-02 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang closed KYLIN-3141.
-
Resolution: Duplicate

Duplicated with https://issues.apache.org/jira/browse/KYLIN-1892

> Support offset for segment merge
> 
>
> Key: KYLIN-3141
> URL: https://issues.apache.org/jira/browse/KYLIN-3141
> Project: Kylin
>  Issue Type: New Feature
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> This is a request to add an offset to the Kylin segment merge so as to avoid 
> immediate merge of segments after a 7 day / 30 day window.
> Introducing a delay(offset) would help in 2 things -
> a) When auto merge kicks off I have a new segment and my daily incremental 
> segment build script will fail because they won’t find the last segment.
> b) Lot of use cases where I may need to do data backfills for some of the 
> days of the previous week, but I end up refreshing the whole merged segment 
> instead of a day or two.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3141) Support offset for segment merge

2018-02-02 Thread Wang, Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350442#comment-16350442
 ] 

Wang, Gang commented on KYLIN-3141:
---

Yes! That is what we want.

Will close this ticket.

> Support offset for segment merge
> 
>
> Key: KYLIN-3141
> URL: https://issues.apache.org/jira/browse/KYLIN-3141
> Project: Kylin
>  Issue Type: New Feature
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> This is a request to add an offset to the Kylin segment merge so as to avoid 
> immediate merge of segments after a 7 day / 30 day window.
> Introducing a delay(offset) would help in 2 things -
> a) When auto merge kicks off I have a new segment and my daily incremental 
> segment build script will fail because they won’t find the last segment.
> b) Lot of use cases where I may need to do data backfills for some of the 
> days of the previous week, but I end up refreshing the whole merged segment 
> instead of a day or two.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-2984) improve the way to delete a job

2018-01-06 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2984:
--
Attachment: 0001-improve-the-way-to-delete-a-job.patch

> improve the way to delete a job
> ---
>
> Key: KYLIN-2984
> URL: https://issues.apache.org/jira/browse/KYLIN-2984
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Wang, Gang
>Priority: Trivial
> Attachments: 0001-improve-the-way-to-delete-a-job.patch
>
>
> Currently user can directly delete a job. However, when the job status is 
> RUNNING, the related segment in NEW is not deleted. 
> I think we should not allow user to delete a job not in FINISHED or DISCARDED 
> state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2984) improve the way to delete a job

2018-01-06 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2984:
--
Attachment: (was: 0001-improve-the-way-to-delete-a-job.patch)

> improve the way to delete a job
> ---
>
> Key: KYLIN-2984
> URL: https://issues.apache.org/jira/browse/KYLIN-2984
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Wang, Gang
>Priority: Trivial
>
> Currently user can directly delete a job. However, when the job status is 
> RUNNING, the related segment in NEW is not deleted. 
> I think we should not allow user to delete a job not in FINISHED or DISCARDED 
> state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2984) improve the way to delete a job

2018-01-06 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2984:
--
Attachment: 0001-improve-the-way-to-delete-a-job.patch

> improve the way to delete a job
> ---
>
> Key: KYLIN-2984
> URL: https://issues.apache.org/jira/browse/KYLIN-2984
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Wang, Gang
>Priority: Trivial
> Attachments: 0001-improve-the-way-to-delete-a-job.patch
>
>
> Currently user can directly delete a job. However, when the job status is 
> RUNNING, the related segment in NEW is not deleted. 
> I think we should not allow user to delete a job not in FINISHED or DISCARDED 
> state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2984) improve the way to delete a job

2018-01-06 Thread Wang, Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315085#comment-16315085
 ] 

Wang, Gang commented on KYLIN-2984:
---

A patch attached. Please help review. Thanks.

> improve the way to delete a job
> ---
>
> Key: KYLIN-2984
> URL: https://issues.apache.org/jira/browse/KYLIN-2984
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Wang, Gang
>Priority: Trivial
> Attachments: 0001-improve-the-way-to-delete-a-job.patch
>
>
> Currently user can directly delete a job. However, when the job status is 
> RUNNING, the related segment in NEW is not deleted. 
> I think we should not allow user to delete a job not in FINISHED or DISCARDED 
> state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-1403) Kylin Hive Column Cardinality Job unable to read bucketed table

2018-01-02 Thread Wang, Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309196#comment-16309196
 ] 

Wang, Gang commented on KYLIN-1403:
---

Tested in Hive 1.2 Kylin 2.1, HCatlog works good in format TXT, Parquet and ORC.
This may be not a issue anymore.

set hive.enforce.bucketing = true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nostrick;
create table testBucket_parquet (x int,y int) partitioned by(z int) clustered 
by(x) into 10 buckets STORED AS PARQUET;
insert into table testBucket_parquet partition(z) values (1, 1, 1);
insert into table testBucket_parquet partition(z) values (2, 1, 1);
insert into table testBucket_parquet partition(z) values (2, 1, 2);
insert into table testBucket_parquet partition(z) values (1, 1, 2);


> Kylin Hive Column Cardinality Job unable to read bucketed table
> ---
>
> Key: KYLIN-1403
> URL: https://issues.apache.org/jira/browse/KYLIN-1403
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.2, v1.3.0
> Environment: - Tested against 
> apache-kylin-1.2-HBase1.1-incubating-SNAPSHOT-bin and 
> apache-kylin-1.3-HBase-1.1-SNAPSHOT-bin
> - Environment is HDP 2.3.4 
> - Hive version: hive-1.2.1.2.3.4.0
> - HBase version: HBase 1.1.2.2.3.4.0-3485
>Reporter: Sebastian Zimmermann
>Assignee: Wang, Gang
>  Labels: newbie
>
> This issue is connected with https://issues.apache.org/jira/browse/KYLIN-1402 
> and states the findings while investigating on the 
> StringIndexOutOfBoundsException.
> While trying to find out why the outputfile created in the cardinality job is 
> empty, we discovered that the only difference between this non-working job 
> and all our other jobs (which work without problems), is that the underlying 
> table is bucketed. 
> The data folder is dbfolder/db/table/partition/bucketfolder/file
> Kylin checks for data in dbfolder/db/table/partition and so is unable to find 
> the data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2903) Support cardinality calculation for Hive view

2018-01-02 Thread Wang, Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309078#comment-16309078
 ] 

Wang, Gang commented on KYLIN-2903:
---

Yes, Shaofeng. I will take this ticket.

> Support cardinality calculation for Hive view
> -
>
> Key: KYLIN-2903
> URL: https://issues.apache.org/jira/browse/KYLIN-2903
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
> Attachments: 
> 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch
>
>
> Currently, Kylin leverage HCatlog to calculate column cardinality for Hive 
> tables. While, HCatlog does not support Hive view actually. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KYLIN-1403) Kylin Hive Column Cardinality Job unable to read bucketed table

2018-01-02 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang reassigned KYLIN-1403:
-

Assignee: Wang, Gang  (was: hongbin ma)

> Kylin Hive Column Cardinality Job unable to read bucketed table
> ---
>
> Key: KYLIN-1403
> URL: https://issues.apache.org/jira/browse/KYLIN-1403
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v1.2, v1.3.0
> Environment: - Tested against 
> apache-kylin-1.2-HBase1.1-incubating-SNAPSHOT-bin and 
> apache-kylin-1.3-HBase-1.1-SNAPSHOT-bin
> - Environment is HDP 2.3.4 
> - Hive version: hive-1.2.1.2.3.4.0
> - HBase version: HBase 1.1.2.2.3.4.0-3485
>Reporter: Sebastian Zimmermann
>Assignee: Wang, Gang
>  Labels: newbie
>
> This issue is connected with https://issues.apache.org/jira/browse/KYLIN-1402 
> and states the findings while investigating on the 
> StringIndexOutOfBoundsException.
> While trying to find out why the outputfile created in the cardinality job is 
> empty, we discovered that the only difference between this non-working job 
> and all our other jobs (which work without problems), is that the underlying 
> table is bucketed. 
> The data folder is dbfolder/db/table/partition/bucketfolder/file
> Kylin checks for data in dbfolder/db/table/partition and so is unable to find 
> the data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KYLIN-3141) Support offset for segment merge

2017-12-28 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang reassigned KYLIN-3141:
-

   Assignee: Wang, Gang
Component/s: Job Engine

> Support offset for segment merge
> 
>
> Key: KYLIN-3141
> URL: https://issues.apache.org/jira/browse/KYLIN-3141
> Project: Kylin
>  Issue Type: New Feature
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> This is a request to add an offset to the Kylin segment merge so as to avoid 
> immediate merge of segments after a 7 day / 30 day window.
> Introducing a delay(offset) would help in 2 things -
> a) When auto merge kicks off I have a new segment and my daily incremental 
> segment build script will fail because they won’t find the last segment.
> b) Lot of use cases where I may need to do data backfills for some of the 
> days of the previous week, but I end up refreshing the whole merged segment 
> instead of a day or two.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-3141) Support offset for segment merge

2017-12-28 Thread Wang, Gang (JIRA)
Wang, Gang created KYLIN-3141:
-

 Summary: Support offset for segment merge
 Key: KYLIN-3141
 URL: https://issues.apache.org/jira/browse/KYLIN-3141
 Project: Kylin
  Issue Type: New Feature
Reporter: Wang, Gang
Priority: Minor


This is a request to add an offset to the Kylin segment merge so as to avoid 
immediate merge of segments after a 7 day / 30 day window.

Introducing a delay(offset) would help in 2 things -

a) When auto merge kicks off I have a new segment and my daily incremental 
segment build script will fail because they won’t find the last segment.
b) Lot of use cases where I may need to do data backfills for some of the days 
of the previous week, but I end up refreshing the whole merged segment instead 
of a day or two.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-3140) Auto merge jobs should not block user build jobs

2017-12-28 Thread Wang, Gang (JIRA)
Wang, Gang created KYLIN-3140:
-

 Summary: Auto merge jobs should not block user build jobs
 Key: KYLIN-3140
 URL: https://issues.apache.org/jira/browse/KYLIN-3140
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Reporter: Wang, Gang
Assignee: Shaofeng SHI


Although in the latest version, Kylin support concurrent jobs. If the 
concurrency is set to 1, there is some possibility that cube build job will 
have dead lock. Say, when there is some issue which causes merge job failed, 
even when you discard the job, another job will be launched and failed again 
due to auto merge policy. And this failed merge job blocks user to build 
incremental segment.
Even if the concurrency is set to larger than 1, the auto merge jobs occupy 
some concurrency quota. 

While, from user perspective, they don't care much about the auto merge jobs, 
and the auto merge jobs should not block the building/refresh jobs they submit 
manually.
A better way may be separating the auto merge jobs from the job queue, 
parameter max-building-segments only limit jobs submitted by users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2903) Support cardinality calculation for Hive view

2017-12-28 Thread Wang, Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306028#comment-16306028
 ] 

Wang, Gang commented on KYLIN-2903:
---

Sorry for the late response. Since the calculation depends on a MR job, 
compared with HyperLogLog, it is quite slow. While, the calculation happens on 
Hive table loading(or manually trigger), usually there is some time before cube 
building, the duration may be acceptable. Also the the cardinality is pretty 
valuable for cube builder who has little knowledge of data in the Hive tables.  
 I think this may be one way.

> Support cardinality calculation for Hive view
> -
>
> Key: KYLIN-2903
> URL: https://issues.apache.org/jira/browse/KYLIN-2903
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
> Attachments: 
> 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch
>
>
> Currently, Kylin leverage HCatlog to calculate column cardinality for Hive 
> tables. While, HCatlog does not support Hive view actually. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2913) Enable job retry for configurable exceptions

2017-12-20 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2913:
--
Attachment: 0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch

make sense. Resubmit a patch.
Retry will happen in below cases:
1) if property "kylin.job.retry-exception-classes" is not set or is null, all 
jobs with exceptions will retry according to the retry times.
2) if property "kylin.job.retry-exception-classes" is set and is not null, only 
jobs with the specified exceptions will retry according to the retry times.


> Enable job retry for configurable exceptions
> 
>
> Key: KYLIN-2913
> URL: https://issues.apache.org/jira/browse/KYLIN-2913
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.1.0
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch
>
>
> In our production environment, we always get some certain exceptions from 
> Hadoop or HBase, like 
> "org.apache.kylin.job.exception.NoEnoughReplicationException", 
> "java.util.ConcurrentModificationException", which results in job failure. 
> While, these exceptions can be handled by retry actually. So, it will be much 
> more convenient if we are able to make job retry on some configurable 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2913) Enable job retry for configurable exceptions

2017-12-20 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2913:
--
Attachment: (was: 
0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch)

> Enable job retry for configurable exceptions
> 
>
> Key: KYLIN-2913
> URL: https://issues.apache.org/jira/browse/KYLIN-2913
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.1.0
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
> Fix For: v2.3.0
>
>
> In our production environment, we always get some certain exceptions from 
> Hadoop or HBase, like 
> "org.apache.kylin.job.exception.NoEnoughReplicationException", 
> "java.util.ConcurrentModificationException", which results in job failure. 
> While, these exceptions can be handled by retry actually. So, it will be much 
> more convenient if we are able to make job retry on some configurable 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-12-20 Thread Wang, Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299649#comment-16299649
 ] 

Wang, Gang commented on KYLIN-2956:
---

Resubmit a patch. Set to 0x8000, and add UT.

> building trie dictionary blocked on value of length over 4095 
> --
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Reporter: Wang, Gang
>Assignee: Wang, Gang
> Attachments: 
> 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch
>
>
> In the new release, Kylin will check the value length when building trie 
> dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
> method:
> private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, 
> usually caused by too long dict value.");
> }
> }
> public static boolean isPositiveShort(int i) {
> return (i & 0x7000) == 0;
> }
> And 0x7000 in binary:      0111   , so the 
> value length should be less than      0001  0001 , 
> values 4095 in decimalism.
> I wonder why is 0x7000, should 0x8000 (    1000  
>  ), support max length:      0111     
> (32767) 
> be what you want? 
> Or 32767 may be too large, I prefer use 0xE000, 0xE000 (  
>   1110   ), support max length:     0001 
>     (8191) 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-12-20 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2956:
--
Attachment: 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch

> building trie dictionary blocked on value of length over 4095 
> --
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Reporter: Wang, Gang
>Assignee: Wang, Gang
> Attachments: 
> 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch
>
>
> In the new release, Kylin will check the value length when building trie 
> dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
> method:
> private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, 
> usually caused by too long dict value.");
> }
> }
> public static boolean isPositiveShort(int i) {
> return (i & 0x7000) == 0;
> }
> And 0x7000 in binary:      0111   , so the 
> value length should be less than      0001  0001 , 
> values 4095 in decimalism.
> I wonder why is 0x7000, should 0x8000 (    1000  
>  ), support max length:      0111     
> (32767) 
> be what you want? 
> Or 32767 may be too large, I prefer use 0xE000, 0xE000 (  
>   1110   ), support max length:     0001 
>     (8191) 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-12-20 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2956:
--
Attachment: (was: 
0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch)

> building trie dictionary blocked on value of length over 4095 
> --
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>
> In the new release, Kylin will check the value length when building trie 
> dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
> method:
> private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, 
> usually caused by too long dict value.");
> }
> }
> public static boolean isPositiveShort(int i) {
> return (i & 0x7000) == 0;
> }
> And 0x7000 in binary:      0111   , so the 
> value length should be less than      0001  0001 , 
> values 4095 in decimalism.
> I wonder why is 0x7000, should 0x8000 (    1000  
>  ), support max length:      0111     
> (32767) 
> be what you want? 
> Or 32767 may be too large, I prefer use 0xE000, 0xE000 (  
>   1110   ), support max length:     0001 
>     (8191) 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2903) support cardinality calculation for Hive view

2017-12-20 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2903:
--
Attachment: 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch

Add UT.

> support cardinality calculation for Hive view
> -
>
> Key: KYLIN-2903
> URL: https://issues.apache.org/jira/browse/KYLIN-2903
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
> Attachments: 
> 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch
>
>
> Currently, Kylin leverage HCatlog to calculate column cardinality for Hive 
> tables. While, HCatlog does not support Hive view actually. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2903) support cardinality calculation for Hive view

2017-12-20 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2903:
--
Attachment: (was: 
0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch)

> support cardinality calculation for Hive view
> -
>
> Key: KYLIN-2903
> URL: https://issues.apache.org/jira/browse/KYLIN-2903
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> Currently, Kylin leverage HCatlog to calculate column cardinality for Hive 
> tables. While, HCatlog does not support Hive view actually. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2903) support cardinality calculation for Hive view

2017-12-19 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2903:
--
Attachment: 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch

Attached it a patch.
One way is to leverage HQL 'COUNT DISTINCT' statement to calculate column 
cardinality, and use 'INSERT OVERWRITE DIRECTORY' to put the result in the 
output path. To make it recognizable for the following step 
HiveColumnCardinalityUpdateJob, the output need following the specified format 
as following:
column1 cardinality
column2 cardinality
column3 cardinality
.

And this can be reached as well by setting 'ROW FORMAT DELIMITED' and adding 
line break in HQL.

> support cardinality calculation for Hive view
> -
>
> Key: KYLIN-2903
> URL: https://issues.apache.org/jira/browse/KYLIN-2903
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
> Attachments: 
> 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch
>
>
> Currently, Kylin leverage HCatlog to calculate column cardinality for Hive 
> tables. While, HCatlog does not support Hive view actually. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2913) Enable job retry for configurable exceptions

2017-12-19 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2913:
--
Priority: Minor  (was: Major)

> Enable job retry for configurable exceptions
> 
>
> Key: KYLIN-2913
> URL: https://issues.apache.org/jira/browse/KYLIN-2913
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.1.0
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch
>
>
> In our production environment, we always get some certain exceptions from 
> Hadoop or HBase, like 
> "org.apache.kylin.job.exception.NoEnoughReplicationException", 
> "java.util.ConcurrentModificationException", which results in job failure. 
> While, these exceptions can be handled by retry actually. So, it will be much 
> more convenient if we are able to make job retry on some configurable 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2913) Enable job retry for configurable exceptions

2017-12-19 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2913:
--
Attachment: 0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch

Add property "kylin.job.retry-exception-classes" to configure retryable 
exceptions. Patch is attached.

> Enable job retry for configurable exceptions
> 
>
> Key: KYLIN-2913
> URL: https://issues.apache.org/jira/browse/KYLIN-2913
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.1.0
>Reporter: Wang, Gang
>Assignee: Wang, Gang
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch
>
>
> In our production environment, we always get some certain exceptions from 
> Hadoop or HBase, like 
> "org.apache.kylin.job.exception.NoEnoughReplicationException", 
> "java.util.ConcurrentModificationException", which results in job failure. 
> While, these exceptions can be handled by retry actually. So, it will be much 
> more convenient if we are able to make job retry on some configurable 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

2017-12-18 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-3115:
--
Component/s: Job Engine

> Incompatible RowKeySplitter initialize between build and merge job
> --
>
> Key: KYLIN-3115
> URL: https://issues.apache.org/jira/browse/KYLIN-3115
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> In class NDCuboidBuilder:
> public NDCuboidBuilder(CubeSegment cubeSegment) {
> this.cubeSegment = cubeSegment;
> this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256);
> this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
> } 
> which will create a bytes array with length 256 to fill in rowkey column 
> bytes.
> While, in class MergeCuboidMapper it's initialized with length 255. 
> rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);
> So, if a dimension is encoded in fixed length and the max length is set to 
> 256. The cube building job will succeed. While, the merge job will always 
> fail. Since in class MergeCuboidMapper method doMap:
> public void doMap(Text key, Text value, Context context) throws 
> IOException, InterruptedException {
> long cuboidID = rowKeySplitter.split(key.getBytes());
> Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
> for (int i = 0; i < cuboid.getColumns().size(); i++) {
> splitOffsets[i] = offset;
> TblColRef col = cuboid.getColumns().get(i);
> int colLength = colIO.getColumnLength(col);
> SplittedBytes split = this.splitBuffers[this.bufferSize++];
> split.length = colLength;
> System.arraycopy(bytes, offset, split.value, 0, colLength);
> offset += colLength;
> }
> Method System.arraycopy will result in IndexOutOfBoundsException exception, 
> if a column value length is 256 in bytes and is being copied to a bytes array 
> with length 255.
> The incompatibility is also occurred in class 
> FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: 
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our 
> production. Since in Hive, type varchar(256) is pretty common, users do have 
> not much Kylin knowledge will prefer to chose fix length encoding on such 
> dimensions, and set max length as 256. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

2017-12-17 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-3115:
--
Description: 
In class NDCuboidBuilder:
public NDCuboidBuilder(CubeSegment cubeSegment) {
this.cubeSegment = cubeSegment;
this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256);
this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
} 
which will create a bytes array with length 256 to fill in rowkey column bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);

So, if a dimension is encoded in fixed length and the max length is set to 256. 
The cube building job will succeed. While, the merge job will always fail. 
Since in class MergeCuboidMapper method doMap:
public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
long cuboidID = rowKeySplitter.split(key.getBytes());
Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
for (int i = 0; i < cuboid.getColumns().size(); i++) {
splitOffsets[i] = offset;
TblColRef col = cuboid.getColumns().get(i);
int colLength = colIO.getColumnLength(col);
SplittedBytes split = this.splitBuffers[this.bufferSize++];
split.length = colLength;
System.arraycopy(bytes, offset, split.value, 0, colLength);
offset += colLength;
}
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column value length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users do have 
not much Kylin knowledge will prefer to chose fix length encoding on such 
dimensions, and set max length as 256. 

  was:
In class NDCuboidBuilder:
public NDCuboidBuilder(CubeSegment cubeSegment) {
this.cubeSegment = cubeSegment;
this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256);
this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
} 
which will create a bytes array with length 256 to fill in rowkey column bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);

So, if a dimension is encoded in fixed length and the max length is set to 256. 
The cube building job will succeed. While, the merge job will always fail. 
Since in class MergeCuboidMapper method doMap:
public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
long cuboidID = rowKeySplitter.split(key.getBytes());
Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
for (int i = 0; i < cuboid.getColumns().size(); i++) {
splitOffsets[i] = offset;
TblColRef col = cuboid.getColumns().get(i);
int colLength = colIO.getColumnLength(col);
SplittedBytes split = this.splitBuffers[this.bufferSize++];
split.length = colLength;
System.arraycopy(bytes, offset, split.value, 0, colLength);_
offset += colLength;
}
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column value length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users does have 
not much knowledge will prefer to chose fix length encoding on such dimensions, 
and set max length as 256. 







> Incompatible RowKeySplitter initialize between build and merge job
> --
>
> Key: KYLIN-3115
> URL: https://issues.apache.org/jira/browse/KYLIN-3115
> Project: Kylin
>  Issue Type: Bug
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> In class NDCuboidBuilder:
> public NDCuboidBuilder(CubeSegment cubeSegment) {
> this.cubeSegment = cub

[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

2017-12-17 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-3115:
--
Description: 
In class NDCuboidBuilder:
public NDCuboidBuilder(CubeSegment cubeSegment) {
this.cubeSegment = cubeSegment;
this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256);
this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
} 
which will create a bytes array with length 256 to fill in rowkey column bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);

So, if a dimension is encoded in fixed length and the max length is set to 256. 
The cube building job will succeed. While, the merge job will always fail. 
Since in class MergeCuboidMapper method doMap:
public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
long cuboidID = rowKeySplitter.split(key.getBytes());
Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
for (int i = 0; i < cuboid.getColumns().size(); i++) {
splitOffsets[i] = offset;
TblColRef col = cuboid.getColumns().get(i);
int colLength = colIO.getColumnLength(col);
SplittedBytes split = this.splitBuffers[this.bufferSize++];
split.length = colLength;
System.arraycopy(bytes, offset, split.value, 0, colLength);_
offset += colLength;
}
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column value length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users does have 
not much knowledge will prefer to chose fix length encoding on such dimensions, 
and set max length as 256. 






  was:
In class NDCuboidBuilder. 
public NDCuboidBuilder(CubeSegment cubeSegment) {
this.cubeSegment = cubeSegment;
this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256);
this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
}
which will create a temp bytes array with length 256 to fill in rowkey column 
bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);

So, if a dimension is encoded in fixed length and the max length is set to 256. 
The cube building job will succeed. While, the merge job will always fail. 
Since in class MergeCuboidMapper method doMap:
public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
long cuboidID = rowKeySplitter.split(key.getBytes());
Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
for (int i = 0; i < cuboid.getColumns().size(); i++) {
splitOffsets[i] = offset;
TblColRef col = cuboid.getColumns().get(i);
int colLength = colIO.getColumnLength(col);
SplittedBytes split = this.splitBuffers[this.bufferSize++];
split.length = colLength;
System.arraycopy(bytes, offset, split.value, 0, colLength);_
offset += colLength;
}
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column value length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users does have 
not much knowledge will prefer to chose fix length encoding on such dimensions, 
and set max length as 256. 







> Incompatible RowKeySplitter initialize between build and merge job
> --
>
> Key: KYLIN-3115
> URL: https://issues.apache.org/jira/browse/KYLIN-3115
> Project: Kylin
>  Issue Type: Bug
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> In class NDCuboidBuilder:
> public NDCuboidBuilder(CubeSegment cubeSegment) {
> this.cubeSegment = cubeSegment;
> 

[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

2017-12-17 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-3115:
--
Description: 
In class NDCuboidBuilder. 
public NDCuboidBuilder(CubeSegment cubeSegment) {
this.cubeSegment = cubeSegment;
this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
}
which will create a temp bytes array with length 256 to fill in rowkey column 
bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);

So, if a dimension is encoded in fixed length and the max length is set to 256. 
The cube building job will succeed. While, the merge job will always fail. 
Since in class MergeCuboidMapper method doMap:
public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
long cuboidID = rowKeySplitter.split(key.getBytes());
Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
for (int i = 0; i < cuboid.getColumns().size(); i++) {
splitOffsets[i] = offset;
TblColRef col = cuboid.getColumns().get(i);
int colLength = colIO.getColumnLength(col);
SplittedBytes split = this.splitBuffers[this.bufferSize++];
split.length = colLength;
System.arraycopy(bytes, offset, split.value, 0, colLength);_
offset += colLength;
}
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column value length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users does have 
not much knowledge will prefer to chose fix length encoding on such dimensions, 
and set max length as 256. 






  was:
In class NDCuboidBuilder. 
_public NDCuboidBuilder(CubeSegment cubeSegment) {
this.cubeSegment = cubeSegment;
this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
}_
which will create a temp bytes array with length 256 to fill in rowkey column 
bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
_rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_
*
So, if a dimension is encoded in fixed length and the length is 256. The cube 
building job will succeed. While, the merge job will always fail.*

public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
   _ long cuboidID = rowKeySplitter.split(key.getBytes());_
Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
_for (int i = 0; i < cuboid.getColumns().size(); i++) {
splitOffsets[i] = offset;
TblColRef col = cuboid.getColumns().get(i);
int colLength = colIO.getColumnLength(col);
SplittedBytes split = this.splitBuffers[this.bufferSize++];
split.length = colLength;
   _ System.arraycopy(bytes, offset, split.value, 0, colLength);_
offset += colLength;
}_
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
_rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);_

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users does have 
not much knowledge will prefer to chose fix length encoding on such dimensions, 
and set max length as 256. 







> Incompatible RowKeySplitter initialize between build and merge job
> --
>
> Key: KYLIN-3115
> URL: https://issues.apache.org/jira/browse/KYLIN-3115
> Project: Kylin
>  Issue Type: Bug
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> In class NDCuboidBuilder. 
> public NDCuboidBuilder(CubeSegment cubeSegment) {
> this.cubeSegment = cubeSegment;
> this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
>

[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

2017-12-17 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-3115:
--
Description: 
In class NDCuboidBuilder. 
public NDCuboidBuilder(CubeSegment cubeSegment) {
this.cubeSegment = cubeSegment;
this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256);
this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
}
which will create a temp bytes array with length 256 to fill in rowkey column 
bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);

So, if a dimension is encoded in fixed length and the max length is set to 256. 
The cube building job will succeed. While, the merge job will always fail. 
Since in class MergeCuboidMapper method doMap:
public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
long cuboidID = rowKeySplitter.split(key.getBytes());
Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
for (int i = 0; i < cuboid.getColumns().size(); i++) {
splitOffsets[i] = offset;
TblColRef col = cuboid.getColumns().get(i);
int colLength = colIO.getColumnLength(col);
SplittedBytes split = this.splitBuffers[this.bufferSize++];
split.length = colLength;
System.arraycopy(bytes, offset, split.value, 0, colLength);_
offset += colLength;
}
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column value length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users does have 
not much knowledge will prefer to chose fix length encoding on such dimensions, 
and set max length as 256. 






  was:
In class NDCuboidBuilder. 
public NDCuboidBuilder(CubeSegment cubeSegment) {
this.cubeSegment = cubeSegment;
this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
}
which will create a temp bytes array with length 256 to fill in rowkey column 
bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);

So, if a dimension is encoded in fixed length and the max length is set to 256. 
The cube building job will succeed. While, the merge job will always fail. 
Since in class MergeCuboidMapper method doMap:
public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
long cuboidID = rowKeySplitter.split(key.getBytes());
Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes):
for (int i = 0; i < cuboid.getColumns().size(); i++) {
splitOffsets[i] = offset;
TblColRef col = cuboid.getColumns().get(i);
int colLength = colIO.getColumnLength(col);
SplittedBytes split = this.splitBuffers[this.bufferSize++];
split.length = colLength;
System.arraycopy(bytes, offset, split.value, 0, colLength);_
offset += colLength;
}
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column value length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users does have 
not much knowledge will prefer to chose fix length encoding on such dimensions, 
and set max length as 256. 







> Incompatible RowKeySplitter initialize between build and merge job
> --
>
> Key: KYLIN-3115
> URL: https://issues.apache.org/jira/browse/KYLIN-3115
> Project: Kylin
>  Issue Type: Bug
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> In class NDCuboidBuilder. 
> public NDCuboidBuilder(CubeSegment cubeSegment) {
> this.cubeSegment = cubeSegment;
> this.rowKeySpli

[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

2017-12-17 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-3115:
--
Description: 
In class NDCuboidBuilder. 
_public NDCuboidBuilder(CubeSegment cubeSegment) {
this.cubeSegment = cubeSegment;
this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
}_
which will create a temp bytes array with length 256 to fill in rowkey column 
bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
_rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_
*
So, if a dimension is encoded in fixed length and the length is 256. The cube 
building job will succeed. While, the merge job will always fail.*

public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
   _ long cuboidID = rowKeySplitter.split(key.getBytes());_
Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
_for (int i = 0; i < cuboid.getColumns().size(); i++) {
splitOffsets[i] = offset;
TblColRef col = cuboid.getColumns().get(i);
int colLength = colIO.getColumnLength(col);
SplittedBytes split = this.splitBuffers[this.bufferSize++];
split.length = colLength;
   _ System.arraycopy(bytes, offset, split.value, 0, colLength);_
offset += colLength;
}_
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
_rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);_

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users does have 
not much knowledge will prefer to chose fix length encoding on such dimensions, 
and set max length as 256. 






  was:
In class NDCuboidBuilder. 
_public NDCuboidBuilder(CubeSegment cubeSegment) {
this.cubeSegment = cubeSegment;
this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
}_
which will create a temp bytes array with length 256 to fill in rowkey column 
bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
_rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_

So, if a dimension is encoded in fixed length and the length is 256. The cube 
building job will succeed. While, the merge job will always fail.
public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
   _ long cuboidID = rowKeySplitter.split(key.getBytes());_
Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
_// rowkey columns
for (int i = 0; i < cuboid.getColumns().size(); i++) {
splitOffsets[i] = offset;
TblColRef col = cuboid.getColumns().get(i);
int colLength = colIO.getColumnLength(col);
SplittedBytes split = this.splitBuffers[this.bufferSize++];
split.length = colLength;
   _ System.arraycopy(bytes, offset, split.value, 0, colLength);_
offset += colLength;
}_
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users does have 
not much knowledge will prefer to chose fix length encoding on such dimensions, 
and set max length as 256. 







> Incompatible RowKeySplitter initialize between build and merge job
> --
>
> Key: KYLIN-3115
> URL: https://issues.apache.org/jira/browse/KYLIN-3115
> Project: Kylin
>  Issue Type: Bug
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> In class NDCuboidBuilder. 
> _public NDCuboidBuilder(CubeSegment cubeSegment) {
> this.cubeSegment = cubeSegment;
> this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
> this.rowKeyEncoderProvider = new Ro

[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

2017-12-17 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-3115:
--
Priority: Minor  (was: Major)

> Incompatible RowKeySplitter initialize between build and merge job
> --
>
> Key: KYLIN-3115
> URL: https://issues.apache.org/jira/browse/KYLIN-3115
> Project: Kylin
>  Issue Type: Bug
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> In class NDCuboidBuilder. 
> _public NDCuboidBuilder(CubeSegment cubeSegment) {
> this.cubeSegment = cubeSegment;
> this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
> this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
> }_
> which will create a temp bytes array with length 256 to fill in rowkey column 
> bytes.
> While, in class MergeCuboidMapper it's initialized with length 255. 
> _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_
> So, if a dimension is encoded in fixed length and the length is 256. The cube 
> building job will succeed. While, the merge job will always fail.
> public void doMap(Text key, Text value, Context context) throws 
> IOException, InterruptedException {
>_ long cuboidID = rowKeySplitter.split(key.getBytes());_
> Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
> _// rowkey columns
> for (int i = 0; i < cuboid.getColumns().size(); i++) {
> splitOffsets[i] = offset;
> TblColRef col = cuboid.getColumns().get(i);
> int colLength = colIO.getColumnLength(col);
> SplittedBytes split = this.splitBuffers[this.bufferSize++];
> split.length = colLength;
>_ System.arraycopy(bytes, offset, split.value, 0, colLength);_
> offset += colLength;
> }_
> Method System.arraycopy will result in IndexOutOfBoundsException exception, 
> if a column length is 256 in bytes and is being copied to a bytes array with 
> length 255.
> The incompatibility is also occurred in class 
> FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: 
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our 
> production. Since in Hive, type varchar(256) is pretty common, users does 
> have not much knowledge will prefer to chose fix length encoding on such 
> dimensions, and set max length as 256. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

2017-12-17 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang reassigned KYLIN-3115:
-

Assignee: Wang, Gang

> Incompatible RowKeySplitter initialize between build and merge job
> --
>
> Key: KYLIN-3115
> URL: https://issues.apache.org/jira/browse/KYLIN-3115
> Project: Kylin
>  Issue Type: Bug
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>
> In class NDCuboidBuilder. 
> _public NDCuboidBuilder(CubeSegment cubeSegment) {
> this.cubeSegment = cubeSegment;
> this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
> this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
> }_
> which will create a temp bytes array with length 256 to fill in rowkey column 
> bytes.
> While, in class MergeCuboidMapper it's initialized with length 255. 
> _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_
> So, if a dimension is encoded in fixed length and the length is 256. The cube 
> building job will succeed. While, the merge job will always fail.
> public void doMap(Text key, Text value, Context context) throws 
> IOException, InterruptedException {
>_ long cuboidID = rowKeySplitter.split(key.getBytes());_
> Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);
> in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
> _// rowkey columns
> for (int i = 0; i < cuboid.getColumns().size(); i++) {
> splitOffsets[i] = offset;
> TblColRef col = cuboid.getColumns().get(i);
> int colLength = colIO.getColumnLength(col);
> SplittedBytes split = this.splitBuffers[this.bufferSize++];
> split.length = colLength;
>_ System.arraycopy(bytes, offset, split.value, 0, colLength);_
> offset += colLength;
> }_
> Method System.arraycopy will result in IndexOutOfBoundsException exception, 
> if a column length is 256 in bytes and is being copied to a bytes array with 
> length 255.
> The incompatibility is also occurred in class 
> FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: 
> rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);
> I think the better way is to always set the max split length as 256.
> And actually dimension encoded in fix length 256 is pretty common in our 
> production. Since in Hive, type varchar(256) is pretty common, users does 
> have not much knowledge will prefer to chose fix length encoding on such 
> dimensions, and set max length as 256. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job

2017-12-17 Thread Wang, Gang (JIRA)
Wang, Gang created KYLIN-3115:
-

 Summary: Incompatible RowKeySplitter initialize between build and 
merge job
 Key: KYLIN-3115
 URL: https://issues.apache.org/jira/browse/KYLIN-3115
 Project: Kylin
  Issue Type: Bug
Reporter: Wang, Gang


In class NDCuboidBuilder. 
_public NDCuboidBuilder(CubeSegment cubeSegment) {
this.cubeSegment = cubeSegment;
this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*;
this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment);
}_
which will create a temp bytes array with length 256 to fill in rowkey column 
bytes.

While, in class MergeCuboidMapper it's initialized with length 255. 
_rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_

So, if a dimension is encoded in fixed length and the length is 256. The cube 
building job will succeed. While, the merge job will always fail.
public void doMap(Text key, Text value, Context context) throws 
IOException, InterruptedException {
   _ long cuboidID = rowKeySplitter.split(key.getBytes());_
Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID);

in method doMap, it will invoke RowKeySplitter.split(byte[] bytes):
_// rowkey columns
for (int i = 0; i < cuboid.getColumns().size(); i++) {
splitOffsets[i] = offset;
TblColRef col = cuboid.getColumns().get(i);
int colLength = colIO.getColumnLength(col);
SplittedBytes split = this.splitBuffers[this.bufferSize++];
split.length = colLength;
   _ System.arraycopy(bytes, offset, split.value, 0, colLength);_
offset += colLength;
}_
Method System.arraycopy will result in IndexOutOfBoundsException exception, if 
a column length is 256 in bytes and is being copied to a bytes array with 
length 255.

The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, 
initialize RowkeySplitter as: 
rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);

I think the better way is to always set the max split length as 256.
And actually dimension encoded in fix length 256 is pretty common in our 
production. Since in Hive, type varchar(256) is pretty common, users does have 
not much knowledge will prefer to chose fix length encoding on such dimensions, 
and set max length as 256. 








--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-12-17 Thread Wang, Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294449#comment-16294449
 ] 

Wang, Gang edited comment on KYLIN-2956 at 12/18/17 2:16 AM:
-

I think when building trie dictionary, 32767 is too huge as the value length 
limit, 8191 should make sense. Fix as '0xE000'.


was (Author: gwang3):
I think when building trie dictionary, 32767 is too huge as the value length 
limit, 8191 should make length. Fix as '0xE000'.

> building trie dictionary blocked on value of length over 4095 
> --
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Reporter: Wang, Gang
>Assignee: Wang, Gang
> Attachments: 
> 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch
>
>
> In the new release, Kylin will check the value length when building trie 
> dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
> method:
> private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, 
> usually caused by too long dict value.");
> }
> }
> public static boolean isPositiveShort(int i) {
> return (i & 0x7000) == 0;
> }
> And 0x7000 in binary:      0111   , so the 
> value length should be less than      0001  0001 , 
> values 4095 in decimalism.
> I wonder why is 0x7000, should 0x8000 (    1000  
>  ), support max length:      0111     
> (32767) 
> be what you want? 
> Or 32767 may be too large, I prefer use 0xE000, 0xE000 (  
>   1110   ), support max length:     0001 
>     (8191) 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-12-17 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2956:
--
Attachment: 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch

I think when building trie dictionary, 32767 is too huge as the value length 
limit, 8191 should make length. Fix as '0xE000'.

> building trie dictionary blocked on value of length over 4095 
> --
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Reporter: Wang, Gang
>Assignee: Wang, Gang
> Attachments: 
> 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch
>
>
> In the new release, Kylin will check the value length when building trie 
> dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
> method:
> private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, 
> usually caused by too long dict value.");
> }
> }
> public static boolean isPositiveShort(int i) {
> return (i & 0x7000) == 0;
> }
> And 0x7000 in binary:      0111   , so the 
> value length should be less than      0001  0001 , 
> values 4095 in decimalism.
> I wonder why is 0x7000, should 0x8000 (    1000  
>  ), support max length:      0111     
> (32767) 
> be what you want? 
> Or 32767 may be too large, I prefer use 0xE000, 0xE000 (  
>   1110   ), support max length:     0001 
>     (8191) 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2903) support cardinality calculation for Hive view

2017-11-29 Thread Wang, Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272025#comment-16272025
 ] 

Wang, Gang commented on KYLIN-2903:
---

Thanks Shaofeng. Sorry for being busy in the past weeks. I will work out a 
patch in next one or two weeks.


发自我的iPhone

-- Original --
From: Shaofeng SHI (JIRA) 
Date: 周三,11月 29,2017 9:34 上午
To: 405611081 <405611...@qq.com>
Subject: Re: [jira] [Commented] (KYLIN-2903) support cardinality calculation 
forHive view




[ 
https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269891#comment-16269891
 ] 

Shaofeng SHI commented on KYLIN-2903:
-

Hi Wang Gang, yes this is a known issue in Kylin. Would you like to contribute 
a patch for this? Thanks for making Kylin better!




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


> support cardinality calculation for Hive view
> -
>
> Key: KYLIN-2903
> URL: https://issues.apache.org/jira/browse/KYLIN-2903
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> Currently, Kylin leverage HCatlog to calculate column cardinality for Hive 
> tables. While, HCatlog does not support Hive view actually. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-10-29 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang reassigned KYLIN-2956:
-

Assignee: Wang, Gang  (was: Shaofeng SHI)

> building trie dictionary blocked on value of length over 4095 
> --
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>
> In the new release, Kylin will check the value length when building trie 
> dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
> method:
> private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, 
> usually caused by too long dict value.");
> }
> }
> public static boolean isPositiveShort(int i) {
> return (i & 0x7000) == 0;
> }
> And 0x7000 in binary:      0111   , so the 
> value length should be less than      0001  0001 , 
> values 4095 in decimalism.
> I wonder why is 0x7000, should 0x8000 (    1000  
>  ), support max length:      0111     
> (32767) 
> be what you want? 
> Or 32767 may be too large, I prefer use 0xE000, 0xE000 (  
>   1110   ), support max length:     0001 
>     (8191) 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-10-22 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2956:
--
Description: 
In the new release, Kylin will check the value length when building trie 
dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
method:

private void positiveShortPreCheck(int i, String fieldName) {
if (!BytesUtil.isPositiveShort(i)) {
throw new IllegalStateException(fieldName + " is not positive short, 
usually caused by too long dict value.");
}
}
public static boolean isPositiveShort(int i) {
return (i & 0x7000) == 0;
}

And 0x7000 in binary:      0111   , so the 
value length should be less than      0001  0001 , 
values 4095 in decimalism.

I wonder why is 0x7000, should 0x8000 (    1000  
 ), support max length:      0111     
(32767) 
be what you want? 
Or 32767 may be too large, I prefer use 0xE000, 0xE000 (   
 1110   ), support max length:     0001  
   (8191) 
 


  was:
In the new release, Kylin will check the value length when building trie 
dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
method:

private void positiveShortPreCheck(int i, String fieldName) {
if (!BytesUtil.isPositiveShort(i)) {
throw new IllegalStateException(fieldName + " is not positive short, 
usually caused by too long dict value.");
}
}
public static boolean isPositiveShort(int i) {
return (i & 0x7000) == 0;
}

And 0x7000 in binary:      0111   , so the 
value length should be less than      0001  0001 , 
values 4095 in decimalism.

I wonder why is 0x7000, should
 0x8000:     1000   
support max length:      0111     (32767) 
be what you want? And 32767 may be too lagrge, I prefer use 0xE000,
  0xE000:     1110   , 
support max length:     0001     (8191) 
 



> building trie dictionary blocked on value of length over 4095 
> --
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Reporter: Wang, Gang
>Assignee: Shaofeng SHI
>
> In the new release, Kylin will check the value length when building trie 
> dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
> method:
> private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, 
> usually caused by too long dict value.");
> }
> }
> public static boolean isPositiveShort(int i) {
> return (i & 0x7000) == 0;
> }
> And 0x7000 in binary:      0111   , so the 
> value length should be less than      0001  0001 , 
> values 4095 in decimalism.
> I wonder why is 0x7000, should 0x8000 (    1000  
>  ), support max length:      0111     
> (32767) 
> be what you want? 
> Or 32767 may be too large, I prefer use 0xE000, 0xE000 (  
>   1110   ), support max length:     0001 
>     (8191) 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-10-22 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang reassigned KYLIN-2956:
-

Assignee: Shaofeng SHI  (was: Wang, Gang)

> building trie dictionary blocked on value of length over 4095 
> --
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Reporter: Wang, Gang
>Assignee: Shaofeng SHI
>
> In the new release, Kylin will check the value length when building trie 
> dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
> method:
> private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, 
> usually caused by too long dict value.");
> }
> }
> public static boolean isPositiveShort(int i) {
> return (i & 0x7000) == 0;
> }
> And 0x7000 in binary:      0111   , so the 
> value length should be less than      0001  0001 , 
> values 4095 in decimalism.
> I wonder why is 0x7000, should
>  0x8000:     1000   
> support max length:      0111     (32767) 
> be what you want? And 32767 may be too lagrge, I prefer use 0xE000,
>   0xE000:     1110   , 
> support max length:     0001     (8191) 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-10-22 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2956:
--
Description: 
In the new release, Kylin will check the value length when building trie 
dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
method:

private void positiveShortPreCheck(int i, String fieldName) {
if (!BytesUtil.isPositiveShort(i)) {
throw new IllegalStateException(fieldName + " is not positive short, 
usually caused by too long dict value.");
}
}
public static boolean isPositiveShort(int i) {
return (i & 0x7000) == 0;
}

And 0x7000 in binary:      0111   , so the 
value length should be less than      0001  0001 , 
values 4095 in decimalism.

I wonder why is 0x7000, should
 0x8000:     1000   
support max length:      0111     (32767) 
be what you want? And 32767 may be too lagrge, I prefer use 0xE000,
  0xE000:     1110   , 
support max length:     0001     (8191) 
 


  was:
In the new release, Kylin will check the value length when building trie 
dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
method:
_private void positiveShortPreCheck(int i, String fieldName) {
if (!BytesUtil.isPositiveShort(i)) {
throw new IllegalStateException(fieldName + " is not positive short, 
usually caused by too long dict value.");
}
}_

_public static boolean isPositiveShort(int i) {
return (i & 0x7000) == 0;
}
_
And 0x7000 in binary:      0111   , so the 
value length should be less than      0001  0001 , 
values 4095 in decimalism.

I wonder why is 0x7000, should
 0x8000:     1000   
support max length:      0111     (32767) 
be what you want? And 32767 may be too lagrge, I prefer use 0xE000,
  0xE000:     1110   , 
support max length:     0001     (8191) 
 



> building trie dictionary blocked on value of length over 4095 
> --
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>
> In the new release, Kylin will check the value length when building trie 
> dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
> method:
> private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, 
> usually caused by too long dict value.");
> }
> }
> public static boolean isPositiveShort(int i) {
> return (i & 0x7000) == 0;
> }
> And 0x7000 in binary:      0111   , so the 
> value length should be less than      0001  0001 , 
> values 4095 in decimalism.
> I wonder why is 0x7000, should
>  0x8000:     1000   
> support max length:      0111     (32767) 
> be what you want? And 32767 may be too lagrge, I prefer use 0xE000,
>   0xE000:     1110   , 
> support max length:     0001     (8191) 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-10-22 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2956:
--
Description: 
In the new release, Kylin will check the value length when building trie 
dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
method:
_private void positiveShortPreCheck(int i, String fieldName) {
if (!BytesUtil.isPositiveShort(i)) {
throw new IllegalStateException(fieldName + " is not positive short, 
usually caused by too long dict value.");
}
}_

_public static boolean isPositiveShort(int i) {
return (i & 0x7000) == 0;
}
_
And 0x7000 in binary:      0111   , so the 
value length should be less than      0001  0001 , 
values 4095 in decimalism.

I wonder why is 0x7000, should
 0x8000:     1000   
support max length:      0111     (32767) 
be what you want? And 32767 may be too lagrge, I prefer use 0xE000,
  0xE000:     1110   , 
support max length:     0001     (8191) 
 


  was:
In the new release, Kylin will check the value length when building trie 
dictionary, in class _TrieDictionaryBuilder_ method _buildTrieBytes_ , through 
method:
_private void positiveShortPreCheck(int i, String fieldName) {
if (!BytesUtil.isPositiveShort(i)) {
throw new IllegalStateException(fieldName + " is not positive short, 
usually caused by too long dict value.");
}
} _

_public static boolean isPositiveShort(int i) {
return (i & 0x7000) == 0;
}_

And 0x7000 in binary:      0111   , so the 
value length should be less than      0001  0001 , 
values 4095 in decimalism.

I wonder why is 0x7000, should
 0x8000:     1000   
support max length:      0111     (32767) 
be what you want? And 32767 may be too lagrge, I prefer use 0xE000,
  0xE000:     1110   , 
support max length:     0001     (8191) 
 



> building trie dictionary blocked on value of length over 4095 
> --
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>
> In the new release, Kylin will check the value length when building trie 
> dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through 
> method:
> _private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, 
> usually caused by too long dict value.");
> }
> }_
> _public static boolean isPositiveShort(int i) {
> return (i & 0x7000) == 0;
> }
> _
> And 0x7000 in binary:      0111   , so the 
> value length should be less than      0001  0001 , 
> values 4095 in decimalism.
> I wonder why is 0x7000, should
>  0x8000:     1000   
> support max length:      0111     (32767) 
> be what you want? And 32767 may be too lagrge, I prefer use 0xE000,
>   0xE000:     1110   , 
> support max length:     0001     (8191) 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-10-22 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang reassigned KYLIN-2956:
-

Assignee: Wang, Gang

> building trie dictionary blocked on value of length over 4095 
> --
>
> Key: KYLIN-2956
> URL: https://issues.apache.org/jira/browse/KYLIN-2956
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>
> In the new release, Kylin will check the value length when building trie 
> dictionary, in class _TrieDictionaryBuilder_ method _buildTrieBytes_ , 
> through method:
> _private void positiveShortPreCheck(int i, String fieldName) {
> if (!BytesUtil.isPositiveShort(i)) {
> throw new IllegalStateException(fieldName + " is not positive short, 
> usually caused by too long dict value.");
> }
> } _
> _public static boolean isPositiveShort(int i) {
> return (i & 0x7000) == 0;
> }_
> And 0x7000 in binary:      0111   , so the 
> value length should be less than      0001  0001 , 
> values 4095 in decimalism.
> I wonder why is 0x7000, should
>  0x8000:     1000   
> support max length:      0111     (32767) 
> be what you want? And 32767 may be too lagrge, I prefer use 0xE000,
>   0xE000:     1110   , 
> support max length:     0001     (8191) 
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2903) support cardinality calculation for Hive view

2017-10-22 Thread Wang, Gang (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214668#comment-16214668
 ] 

Wang, Gang commented on KYLIN-2903:
---

Will use HQL statement: count(distinct _column_ ) to calculate _column_ 
cardinality.

> support cardinality calculation for Hive view
> -
>
> Key: KYLIN-2903
> URL: https://issues.apache.org/jira/browse/KYLIN-2903
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Wang, Gang
>Priority: Minor
>
> Currently, Kylin leverage HCatlog to calculate column cardinality for Hive 
> tables. While, HCatlog does not support Hive view actually. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2956) building trie dictionary blocked on value of length over 4095

2017-10-22 Thread Wang, Gang (JIRA)
Wang, Gang created KYLIN-2956:
-

 Summary: building trie dictionary blocked on value of length over 
4095 
 Key: KYLIN-2956
 URL: https://issues.apache.org/jira/browse/KYLIN-2956
 Project: Kylin
  Issue Type: Bug
  Components: General
Reporter: Wang, Gang


In the new release, Kylin will check the value length when building trie 
dictionary, in class _TrieDictionaryBuilder_ method _buildTrieBytes_ , through 
method:
_private void positiveShortPreCheck(int i, String fieldName) {
if (!BytesUtil.isPositiveShort(i)) {
throw new IllegalStateException(fieldName + " is not positive short, 
usually caused by too long dict value.");
}
} _

_public static boolean isPositiveShort(int i) {
return (i & 0x7000) == 0;
}_

And 0x7000 in binary:      0111   , so the 
value length should be less than      0001  0001 , 
values 4095 in decimalism.

I wonder why is 0x7000, should
 0x8000:     1000   
support max length:      0111     (32767) 
be what you want? And 32767 may be too lagrge, I prefer use 0xE000,
  0xE000:     1110   , 
support max length:     0001     (8191) 
 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2913) Enable job retry for configurable exceptions

2017-09-27 Thread Wang, Gang (JIRA)
Wang, Gang created KYLIN-2913:
-

 Summary: Enable job retry for configurable exceptions
 Key: KYLIN-2913
 URL: https://issues.apache.org/jira/browse/KYLIN-2913
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v2.1.0
Reporter: Wang, Gang
Assignee: Dong Li
 Fix For: v2.2.0


In our production environment, we always get some certain exceptions from 
Hadoop or HBase, like 
"org.apache.kylin.job.exception.NoEnoughReplicationException", 
"java.util.ConcurrentModificationException", which results in job failure. 
While, these exceptions can be handled by retry actually. So, it will be much 
more convenient if we are able to make job retry on some configurable 
exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-2903) support cardinality calculation for Hive view

2017-09-25 Thread Wang, Gang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Gang updated KYLIN-2903:
--
Description: Currently, Kylin leverage HCatlog to calculate column 
cardinality for Hive tables. While, HCatlog does not support Hive view 
actually.   (was: Currently, Kylin leverage HCatlog to calculate column 
cardinality for Hive tables. While, HCatlog does not support Hive view 
actually.)

> support cardinality calculation for Hive view
> -
>
> Key: KYLIN-2903
> URL: https://issues.apache.org/jira/browse/KYLIN-2903
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: Wang, Gang
>Assignee: Zhong Yanghong
>Priority: Minor
>
> Currently, Kylin leverage HCatlog to calculate column cardinality for Hive 
> tables. While, HCatlog does not support Hive view actually. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-2903) support cardinality calculation for Hive view

2017-09-25 Thread Wang, Gang (JIRA)
Wang, Gang created KYLIN-2903:
-

 Summary: support cardinality calculation for Hive view
 Key: KYLIN-2903
 URL: https://issues.apache.org/jira/browse/KYLIN-2903
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Reporter: Wang, Gang
Assignee: Dong Li
Priority: Minor


Currently, Kylin leverage HCatlog to calculate column cardinality for Hive 
tables. While, HCatlog does not support Hive view actually.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)