[jira] [Commented] (KYLIN-3140) Auto merge jobs should not block user build jobs
[ https://issues.apache.org/jira/browse/KYLIN-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350485#comment-16350485 ] Wang, Gang commented on KYLIN-3140: --- Yes. The administrator do need to take care of each failed job. And my point is the failed auto merge job should not block the user incremental building job. Currently, the problem is that if the max-building-segments number is set to 1, before the failed auto merge job is handled properly and resumed successfully, user can not submit a new building job. And if we differentiate the auto merge job and user build/refresh job, we can also set a concurrent threshold for each of them, which may protect Kylin server from OOM or other performance issue. > Auto merge jobs should not block user build jobs > > > Key: KYLIN-3140 > URL: https://issues.apache.org/jira/browse/KYLIN-3140 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang, Gang >Assignee: Shaofeng SHI >Priority: Major > > Although in the latest version, Kylin support concurrent jobs. If the > concurrency is set to 1, there is some possibility that cube build job will > have dead lock. Say, when there is some issue which causes merge job failed, > even when you discard the job, another job will be launched and failed again > due to auto merge policy. And this failed merge job blocks user to build > incremental segment. > Even if the concurrency is set to larger than 1, the auto merge jobs occupy > some concurrency quota. > While, from user perspective, they don't care much about the auto merge jobs, > and the auto merge jobs should not block the building/refresh jobs they > submit manually. > A better way may be separating the auto merge jobs from the job queue, > parameter max-building-segments only limit jobs submitted by users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (KYLIN-3141) Support offset for segment merge
[ https://issues.apache.org/jira/browse/KYLIN-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang closed KYLIN-3141. - Resolution: Duplicate Duplicated with https://issues.apache.org/jira/browse/KYLIN-1892 > Support offset for segment merge > > > Key: KYLIN-3141 > URL: https://issues.apache.org/jira/browse/KYLIN-3141 > Project: Kylin > Issue Type: New Feature > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > This is a request to add an offset to the Kylin segment merge so as to avoid > immediate merge of segments after a 7 day / 30 day window. > Introducing a delay(offset) would help in 2 things - > a) When auto merge kicks off I have a new segment and my daily incremental > segment build script will fail because they won’t find the last segment. > b) Lot of use cases where I may need to do data backfills for some of the > days of the previous week, but I end up refreshing the whole merged segment > instead of a day or two. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KYLIN-3141) Support offset for segment merge
[ https://issues.apache.org/jira/browse/KYLIN-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350442#comment-16350442 ] Wang, Gang commented on KYLIN-3141: --- Yes! That is what we want. Will close this ticket. > Support offset for segment merge > > > Key: KYLIN-3141 > URL: https://issues.apache.org/jira/browse/KYLIN-3141 > Project: Kylin > Issue Type: New Feature > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > This is a request to add an offset to the Kylin segment merge so as to avoid > immediate merge of segments after a 7 day / 30 day window. > Introducing a delay(offset) would help in 2 things - > a) When auto merge kicks off I have a new segment and my daily incremental > segment build script will fail because they won’t find the last segment. > b) Lot of use cases where I may need to do data backfills for some of the > days of the previous week, but I end up refreshing the whole merged segment > instead of a day or two. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KYLIN-2984) improve the way to delete a job
[ https://issues.apache.org/jira/browse/KYLIN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2984: -- Attachment: 0001-improve-the-way-to-delete-a-job.patch > improve the way to delete a job > --- > > Key: KYLIN-2984 > URL: https://issues.apache.org/jira/browse/KYLIN-2984 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Wang, Gang >Priority: Trivial > Attachments: 0001-improve-the-way-to-delete-a-job.patch > > > Currently user can directly delete a job. However, when the job status is > RUNNING, the related segment in NEW is not deleted. > I think we should not allow user to delete a job not in FINISHED or DISCARDED > state. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2984) improve the way to delete a job
[ https://issues.apache.org/jira/browse/KYLIN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2984: -- Attachment: (was: 0001-improve-the-way-to-delete-a-job.patch) > improve the way to delete a job > --- > > Key: KYLIN-2984 > URL: https://issues.apache.org/jira/browse/KYLIN-2984 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Wang, Gang >Priority: Trivial > > Currently user can directly delete a job. However, when the job status is > RUNNING, the related segment in NEW is not deleted. > I think we should not allow user to delete a job not in FINISHED or DISCARDED > state. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2984) improve the way to delete a job
[ https://issues.apache.org/jira/browse/KYLIN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2984: -- Attachment: 0001-improve-the-way-to-delete-a-job.patch > improve the way to delete a job > --- > > Key: KYLIN-2984 > URL: https://issues.apache.org/jira/browse/KYLIN-2984 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Wang, Gang >Priority: Trivial > Attachments: 0001-improve-the-way-to-delete-a-job.patch > > > Currently user can directly delete a job. However, when the job status is > RUNNING, the related segment in NEW is not deleted. > I think we should not allow user to delete a job not in FINISHED or DISCARDED > state. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2984) improve the way to delete a job
[ https://issues.apache.org/jira/browse/KYLIN-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315085#comment-16315085 ] Wang, Gang commented on KYLIN-2984: --- A patch attached. Please help review. Thanks. > improve the way to delete a job > --- > > Key: KYLIN-2984 > URL: https://issues.apache.org/jira/browse/KYLIN-2984 > Project: Kylin > Issue Type: Improvement >Reporter: Zhong Yanghong >Assignee: Wang, Gang >Priority: Trivial > Attachments: 0001-improve-the-way-to-delete-a-job.patch > > > Currently user can directly delete a job. However, when the job status is > RUNNING, the related segment in NEW is not deleted. > I think we should not allow user to delete a job not in FINISHED or DISCARDED > state. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-1403) Kylin Hive Column Cardinality Job unable to read bucketed table
[ https://issues.apache.org/jira/browse/KYLIN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309196#comment-16309196 ] Wang, Gang commented on KYLIN-1403: --- Tested in Hive 1.2 Kylin 2.1, HCatlog works good in format TXT, Parquet and ORC. This may be not a issue anymore. set hive.enforce.bucketing = true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nostrick; create table testBucket_parquet (x int,y int) partitioned by(z int) clustered by(x) into 10 buckets STORED AS PARQUET; insert into table testBucket_parquet partition(z) values (1, 1, 1); insert into table testBucket_parquet partition(z) values (2, 1, 1); insert into table testBucket_parquet partition(z) values (2, 1, 2); insert into table testBucket_parquet partition(z) values (1, 1, 2); > Kylin Hive Column Cardinality Job unable to read bucketed table > --- > > Key: KYLIN-1403 > URL: https://issues.apache.org/jira/browse/KYLIN-1403 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.2, v1.3.0 > Environment: - Tested against > apache-kylin-1.2-HBase1.1-incubating-SNAPSHOT-bin and > apache-kylin-1.3-HBase-1.1-SNAPSHOT-bin > - Environment is HDP 2.3.4 > - Hive version: hive-1.2.1.2.3.4.0 > - HBase version: HBase 1.1.2.2.3.4.0-3485 >Reporter: Sebastian Zimmermann >Assignee: Wang, Gang > Labels: newbie > > This issue is connected with https://issues.apache.org/jira/browse/KYLIN-1402 > and states the findings while investigating on the > StringIndexOutOfBoundsException. > While trying to find out why the outputfile created in the cardinality job is > empty, we discovered that the only difference between this non-working job > and all our other jobs (which work without problems), is that the underlying > table is bucketed. > The data folder is dbfolder/db/table/partition/bucketfolder/file > Kylin checks for data in dbfolder/db/table/partition and so is unable to find > the data. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2903) Support cardinality calculation for Hive view
[ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309078#comment-16309078 ] Wang, Gang commented on KYLIN-2903: --- Yes, Shaofeng. I will take this ticket. > Support cardinality calculation for Hive view > - > > Key: KYLIN-2903 > URL: https://issues.apache.org/jira/browse/KYLIN-2903 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > Attachments: > 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch > > > Currently, Kylin leverage HCatlog to calculate column cardinality for Hive > tables. While, HCatlog does not support Hive view actually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KYLIN-1403) Kylin Hive Column Cardinality Job unable to read bucketed table
[ https://issues.apache.org/jira/browse/KYLIN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang reassigned KYLIN-1403: - Assignee: Wang, Gang (was: hongbin ma) > Kylin Hive Column Cardinality Job unable to read bucketed table > --- > > Key: KYLIN-1403 > URL: https://issues.apache.org/jira/browse/KYLIN-1403 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.2, v1.3.0 > Environment: - Tested against > apache-kylin-1.2-HBase1.1-incubating-SNAPSHOT-bin and > apache-kylin-1.3-HBase-1.1-SNAPSHOT-bin > - Environment is HDP 2.3.4 > - Hive version: hive-1.2.1.2.3.4.0 > - HBase version: HBase 1.1.2.2.3.4.0-3485 >Reporter: Sebastian Zimmermann >Assignee: Wang, Gang > Labels: newbie > > This issue is connected with https://issues.apache.org/jira/browse/KYLIN-1402 > and states the findings while investigating on the > StringIndexOutOfBoundsException. > While trying to find out why the outputfile created in the cardinality job is > empty, we discovered that the only difference between this non-working job > and all our other jobs (which work without problems), is that the underlying > table is bucketed. > The data folder is dbfolder/db/table/partition/bucketfolder/file > Kylin checks for data in dbfolder/db/table/partition and so is unable to find > the data. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KYLIN-3141) Support offset for segment merge
[ https://issues.apache.org/jira/browse/KYLIN-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang reassigned KYLIN-3141: - Assignee: Wang, Gang Component/s: Job Engine > Support offset for segment merge > > > Key: KYLIN-3141 > URL: https://issues.apache.org/jira/browse/KYLIN-3141 > Project: Kylin > Issue Type: New Feature > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > This is a request to add an offset to the Kylin segment merge so as to avoid > immediate merge of segments after a 7 day / 30 day window. > Introducing a delay(offset) would help in 2 things - > a) When auto merge kicks off I have a new segment and my daily incremental > segment build script will fail because they won’t find the last segment. > b) Lot of use cases where I may need to do data backfills for some of the > days of the previous week, but I end up refreshing the whole merged segment > instead of a day or two. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3141) Support offset for segment merge
Wang, Gang created KYLIN-3141: - Summary: Support offset for segment merge Key: KYLIN-3141 URL: https://issues.apache.org/jira/browse/KYLIN-3141 Project: Kylin Issue Type: New Feature Reporter: Wang, Gang Priority: Minor This is a request to add an offset to the Kylin segment merge so as to avoid immediate merge of segments after a 7 day / 30 day window. Introducing a delay(offset) would help in 2 things - a) When auto merge kicks off I have a new segment and my daily incremental segment build script will fail because they won’t find the last segment. b) Lot of use cases where I may need to do data backfills for some of the days of the previous week, but I end up refreshing the whole merged segment instead of a day or two. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3140) Auto merge jobs should not block user build jobs
Wang, Gang created KYLIN-3140: - Summary: Auto merge jobs should not block user build jobs Key: KYLIN-3140 URL: https://issues.apache.org/jira/browse/KYLIN-3140 Project: Kylin Issue Type: Improvement Components: Job Engine Reporter: Wang, Gang Assignee: Shaofeng SHI Although in the latest version, Kylin support concurrent jobs. If the concurrency is set to 1, there is some possibility that cube build job will have dead lock. Say, when there is some issue which causes merge job failed, even when you discard the job, another job will be launched and failed again due to auto merge policy. And this failed merge job blocks user to build incremental segment. Even if the concurrency is set to larger than 1, the auto merge jobs occupy some concurrency quota. While, from user perspective, they don't care much about the auto merge jobs, and the auto merge jobs should not block the building/refresh jobs they submit manually. A better way may be separating the auto merge jobs from the job queue, parameter max-building-segments only limit jobs submitted by users. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2903) Support cardinality calculation for Hive view
[ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306028#comment-16306028 ] Wang, Gang commented on KYLIN-2903: --- Sorry for the late response. Since the calculation depends on a MR job, compared with HyperLogLog, it is quite slow. While, the calculation happens on Hive table loading(or manually trigger), usually there is some time before cube building, the duration may be acceptable. Also the the cardinality is pretty valuable for cube builder who has little knowledge of data in the Hive tables. I think this may be one way. > Support cardinality calculation for Hive view > - > > Key: KYLIN-2903 > URL: https://issues.apache.org/jira/browse/KYLIN-2903 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > Attachments: > 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch > > > Currently, Kylin leverage HCatlog to calculate column cardinality for Hive > tables. While, HCatlog does not support Hive view actually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2913) Enable job retry for configurable exceptions
[ https://issues.apache.org/jira/browse/KYLIN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2913: -- Attachment: 0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch make sense. Resubmit a patch. Retry will happen in below cases: 1) if property "kylin.job.retry-exception-classes" is not set or is null, all jobs with exceptions will retry according to the retry times. 2) if property "kylin.job.retry-exception-classes" is set and is not null, only jobs with the specified exceptions will retry according to the retry times. > Enable job retry for configurable exceptions > > > Key: KYLIN-2913 > URL: https://issues.apache.org/jira/browse/KYLIN-2913 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v2.1.0 >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch > > > In our production environment, we always get some certain exceptions from > Hadoop or HBase, like > "org.apache.kylin.job.exception.NoEnoughReplicationException", > "java.util.ConcurrentModificationException", which results in job failure. > While, these exceptions can be handled by retry actually. So, it will be much > more convenient if we are able to make job retry on some configurable > exceptions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2913) Enable job retry for configurable exceptions
[ https://issues.apache.org/jira/browse/KYLIN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2913: -- Attachment: (was: 0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch) > Enable job retry for configurable exceptions > > > Key: KYLIN-2913 > URL: https://issues.apache.org/jira/browse/KYLIN-2913 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v2.1.0 >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > Fix For: v2.3.0 > > > In our production environment, we always get some certain exceptions from > Hadoop or HBase, like > "org.apache.kylin.job.exception.NoEnoughReplicationException", > "java.util.ConcurrentModificationException", which results in job failure. > While, these exceptions can be handled by retry actually. So, it will be much > more convenient if we are able to make job retry on some configurable > exceptions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299649#comment-16299649 ] Wang, Gang commented on KYLIN-2956: --- Resubmit a patch. Set to 0x8000, and add UT. > building trie dictionary blocked on value of length over 4095 > -- > > Key: KYLIN-2956 > URL: https://issues.apache.org/jira/browse/KYLIN-2956 > Project: Kylin > Issue Type: Bug > Components: General >Reporter: Wang, Gang >Assignee: Wang, Gang > Attachments: > 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch > > > In the new release, Kylin will check the value length when building trie > dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through > method: > private void positiveShortPreCheck(int i, String fieldName) { > if (!BytesUtil.isPositiveShort(i)) { > throw new IllegalStateException(fieldName + " is not positive short, > usually caused by too long dict value."); > } > } > public static boolean isPositiveShort(int i) { > return (i & 0x7000) == 0; > } > And 0x7000 in binary: 0111 , so the > value length should be less than 0001 0001 , > values 4095 in decimalism. > I wonder why is 0x7000, should 0x8000 ( 1000 > ), support max length: 0111 > (32767) > be what you want? > Or 32767 may be too large, I prefer use 0xE000, 0xE000 ( > 1110 ), support max length: 0001 > (8191) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2956: -- Attachment: 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch > building trie dictionary blocked on value of length over 4095 > -- > > Key: KYLIN-2956 > URL: https://issues.apache.org/jira/browse/KYLIN-2956 > Project: Kylin > Issue Type: Bug > Components: General >Reporter: Wang, Gang >Assignee: Wang, Gang > Attachments: > 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch > > > In the new release, Kylin will check the value length when building trie > dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through > method: > private void positiveShortPreCheck(int i, String fieldName) { > if (!BytesUtil.isPositiveShort(i)) { > throw new IllegalStateException(fieldName + " is not positive short, > usually caused by too long dict value."); > } > } > public static boolean isPositiveShort(int i) { > return (i & 0x7000) == 0; > } > And 0x7000 in binary: 0111 , so the > value length should be less than 0001 0001 , > values 4095 in decimalism. > I wonder why is 0x7000, should 0x8000 ( 1000 > ), support max length: 0111 > (32767) > be what you want? > Or 32767 may be too large, I prefer use 0xE000, 0xE000 ( > 1110 ), support max length: 0001 > (8191) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2956: -- Attachment: (was: 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch) > building trie dictionary blocked on value of length over 4095 > -- > > Key: KYLIN-2956 > URL: https://issues.apache.org/jira/browse/KYLIN-2956 > Project: Kylin > Issue Type: Bug > Components: General >Reporter: Wang, Gang >Assignee: Wang, Gang > > In the new release, Kylin will check the value length when building trie > dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through > method: > private void positiveShortPreCheck(int i, String fieldName) { > if (!BytesUtil.isPositiveShort(i)) { > throw new IllegalStateException(fieldName + " is not positive short, > usually caused by too long dict value."); > } > } > public static boolean isPositiveShort(int i) { > return (i & 0x7000) == 0; > } > And 0x7000 in binary: 0111 , so the > value length should be less than 0001 0001 , > values 4095 in decimalism. > I wonder why is 0x7000, should 0x8000 ( 1000 > ), support max length: 0111 > (32767) > be what you want? > Or 32767 may be too large, I prefer use 0xE000, 0xE000 ( > 1110 ), support max length: 0001 > (8191) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2903) support cardinality calculation for Hive view
[ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2903: -- Attachment: 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch Add UT. > support cardinality calculation for Hive view > - > > Key: KYLIN-2903 > URL: https://issues.apache.org/jira/browse/KYLIN-2903 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > Attachments: > 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch > > > Currently, Kylin leverage HCatlog to calculate column cardinality for Hive > tables. While, HCatlog does not support Hive view actually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2903) support cardinality calculation for Hive view
[ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2903: -- Attachment: (was: 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch) > support cardinality calculation for Hive view > - > > Key: KYLIN-2903 > URL: https://issues.apache.org/jira/browse/KYLIN-2903 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > Currently, Kylin leverage HCatlog to calculate column cardinality for Hive > tables. While, HCatlog does not support Hive view actually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2903) support cardinality calculation for Hive view
[ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2903: -- Attachment: 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch Attached it a patch. One way is to leverage HQL 'COUNT DISTINCT' statement to calculate column cardinality, and use 'INSERT OVERWRITE DIRECTORY' to put the result in the output path. To make it recognizable for the following step HiveColumnCardinalityUpdateJob, the output need following the specified format as following: column1 cardinality column2 cardinality column3 cardinality . And this can be reached as well by setting 'ROW FORMAT DELIMITED' and adding line break in HQL. > support cardinality calculation for Hive view > - > > Key: KYLIN-2903 > URL: https://issues.apache.org/jira/browse/KYLIN-2903 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > Attachments: > 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch > > > Currently, Kylin leverage HCatlog to calculate column cardinality for Hive > tables. While, HCatlog does not support Hive view actually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2913) Enable job retry for configurable exceptions
[ https://issues.apache.org/jira/browse/KYLIN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2913: -- Priority: Minor (was: Major) > Enable job retry for configurable exceptions > > > Key: KYLIN-2913 > URL: https://issues.apache.org/jira/browse/KYLIN-2913 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v2.1.0 >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch > > > In our production environment, we always get some certain exceptions from > Hadoop or HBase, like > "org.apache.kylin.job.exception.NoEnoughReplicationException", > "java.util.ConcurrentModificationException", which results in job failure. > While, these exceptions can be handled by retry actually. So, it will be much > more convenient if we are able to make job retry on some configurable > exceptions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2913) Enable job retry for configurable exceptions
[ https://issues.apache.org/jira/browse/KYLIN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2913: -- Attachment: 0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch Add property "kylin.job.retry-exception-classes" to configure retryable exceptions. Patch is attached. > Enable job retry for configurable exceptions > > > Key: KYLIN-2913 > URL: https://issues.apache.org/jira/browse/KYLIN-2913 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v2.1.0 >Reporter: Wang, Gang >Assignee: Wang, Gang > Fix For: v2.3.0 > > Attachments: > 0001-KYLIN-2913-Enable-job-retry-for-configurable-excepti.patch > > > In our production environment, we always get some certain exceptions from > Hadoop or HBase, like > "org.apache.kylin.job.exception.NoEnoughReplicationException", > "java.util.ConcurrentModificationException", which results in job failure. > While, these exceptions can be handled by retry actually. So, it will be much > more convenient if we are able to make job retry on some configurable > exceptions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
[ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-3115: -- Component/s: Job Engine > Incompatible RowKeySplitter initialize between build and merge job > -- > > Key: KYLIN-3115 > URL: https://issues.apache.org/jira/browse/KYLIN-3115 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > In class NDCuboidBuilder: > public NDCuboidBuilder(CubeSegment cubeSegment) { > this.cubeSegment = cubeSegment; > this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256); > this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); > } > which will create a bytes array with length 256 to fill in rowkey column > bytes. > While, in class MergeCuboidMapper it's initialized with length 255. > rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255); > So, if a dimension is encoded in fixed length and the max length is set to > 256. The cube building job will succeed. While, the merge job will always > fail. Since in class MergeCuboidMapper method doMap: > public void doMap(Text key, Text value, Context context) throws > IOException, InterruptedException { > long cuboidID = rowKeySplitter.split(key.getBytes()); > Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); > in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes): > for (int i = 0; i < cuboid.getColumns().size(); i++) { > splitOffsets[i] = offset; > TblColRef col = cuboid.getColumns().get(i); > int colLength = colIO.getColumnLength(col); > SplittedBytes split = this.splitBuffers[this.bufferSize++]; > split.length = colLength; > System.arraycopy(bytes, offset, split.value, 0, colLength); > offset += colLength; > } > Method System.arraycopy will result in IndexOutOfBoundsException exception, > if a column value length is 256 in bytes and is being copied to a bytes array > with length 255. > The incompatibility is also occurred in class > FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: > rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); > I think the better way is to always set the max split length as 256. > And actually dimension encoded in fix length 256 is pretty common in our > production. Since in Hive, type varchar(256) is pretty common, users do have > not much Kylin knowledge will prefer to chose fix length encoding on such > dimensions, and set max length as 256. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
[ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-3115: -- Description: In class NDCuboidBuilder: public NDCuboidBuilder(CubeSegment cubeSegment) { this.cubeSegment = cubeSegment; this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256); this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); } which will create a bytes array with length 256 to fill in rowkey column bytes. While, in class MergeCuboidMapper it's initialized with length 255. rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255); So, if a dimension is encoded in fixed length and the max length is set to 256. The cube building job will succeed. While, the merge job will always fail. Since in class MergeCuboidMapper method doMap: public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException { long cuboidID = rowKeySplitter.split(key.getBytes()); Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes): for (int i = 0; i < cuboid.getColumns().size(); i++) { splitOffsets[i] = offset; TblColRef col = cuboid.getColumns().get(i); int colLength = colIO.getColumnLength(col); SplittedBytes split = this.splitBuffers[this.bufferSize++]; split.length = colLength; System.arraycopy(bytes, offset, split.value, 0, colLength); offset += colLength; } Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column value length is 256 in bytes and is being copied to a bytes array with length 255. The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); I think the better way is to always set the max split length as 256. And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users do have not much Kylin knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. was: In class NDCuboidBuilder: public NDCuboidBuilder(CubeSegment cubeSegment) { this.cubeSegment = cubeSegment; this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256); this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); } which will create a bytes array with length 256 to fill in rowkey column bytes. While, in class MergeCuboidMapper it's initialized with length 255. rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255); So, if a dimension is encoded in fixed length and the max length is set to 256. The cube building job will succeed. While, the merge job will always fail. Since in class MergeCuboidMapper method doMap: public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException { long cuboidID = rowKeySplitter.split(key.getBytes()); Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes): for (int i = 0; i < cuboid.getColumns().size(); i++) { splitOffsets[i] = offset; TblColRef col = cuboid.getColumns().get(i); int colLength = colIO.getColumnLength(col); SplittedBytes split = this.splitBuffers[this.bufferSize++]; split.length = colLength; System.arraycopy(bytes, offset, split.value, 0, colLength);_ offset += colLength; } Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column value length is 256 in bytes and is being copied to a bytes array with length 255. The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); I think the better way is to always set the max split length as 256. And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. > Incompatible RowKeySplitter initialize between build and merge job > -- > > Key: KYLIN-3115 > URL: https://issues.apache.org/jira/browse/KYLIN-3115 > Project: Kylin > Issue Type: Bug >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > In class NDCuboidBuilder: > public NDCuboidBuilder(CubeSegment cubeSegment) { > this.cubeSegment = cub
[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
[ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-3115: -- Description: In class NDCuboidBuilder: public NDCuboidBuilder(CubeSegment cubeSegment) { this.cubeSegment = cubeSegment; this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256); this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); } which will create a bytes array with length 256 to fill in rowkey column bytes. While, in class MergeCuboidMapper it's initialized with length 255. rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255); So, if a dimension is encoded in fixed length and the max length is set to 256. The cube building job will succeed. While, the merge job will always fail. Since in class MergeCuboidMapper method doMap: public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException { long cuboidID = rowKeySplitter.split(key.getBytes()); Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes): for (int i = 0; i < cuboid.getColumns().size(); i++) { splitOffsets[i] = offset; TblColRef col = cuboid.getColumns().get(i); int colLength = colIO.getColumnLength(col); SplittedBytes split = this.splitBuffers[this.bufferSize++]; split.length = colLength; System.arraycopy(bytes, offset, split.value, 0, colLength);_ offset += colLength; } Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column value length is 256 in bytes and is being copied to a bytes array with length 255. The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); I think the better way is to always set the max split length as 256. And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. was: In class NDCuboidBuilder. public NDCuboidBuilder(CubeSegment cubeSegment) { this.cubeSegment = cubeSegment; this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256); this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); } which will create a temp bytes array with length 256 to fill in rowkey column bytes. While, in class MergeCuboidMapper it's initialized with length 255. rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255); So, if a dimension is encoded in fixed length and the max length is set to 256. The cube building job will succeed. While, the merge job will always fail. Since in class MergeCuboidMapper method doMap: public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException { long cuboidID = rowKeySplitter.split(key.getBytes()); Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes): for (int i = 0; i < cuboid.getColumns().size(); i++) { splitOffsets[i] = offset; TblColRef col = cuboid.getColumns().get(i); int colLength = colIO.getColumnLength(col); SplittedBytes split = this.splitBuffers[this.bufferSize++]; split.length = colLength; System.arraycopy(bytes, offset, split.value, 0, colLength);_ offset += colLength; } Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column value length is 256 in bytes and is being copied to a bytes array with length 255. The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); I think the better way is to always set the max split length as 256. And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. > Incompatible RowKeySplitter initialize between build and merge job > -- > > Key: KYLIN-3115 > URL: https://issues.apache.org/jira/browse/KYLIN-3115 > Project: Kylin > Issue Type: Bug >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > In class NDCuboidBuilder: > public NDCuboidBuilder(CubeSegment cubeSegment) { > this.cubeSegment = cubeSegment; >
[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
[ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-3115: -- Description: In class NDCuboidBuilder. public NDCuboidBuilder(CubeSegment cubeSegment) { this.cubeSegment = cubeSegment; this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*; this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); } which will create a temp bytes array with length 256 to fill in rowkey column bytes. While, in class MergeCuboidMapper it's initialized with length 255. rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255); So, if a dimension is encoded in fixed length and the max length is set to 256. The cube building job will succeed. While, the merge job will always fail. Since in class MergeCuboidMapper method doMap: public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException { long cuboidID = rowKeySplitter.split(key.getBytes()); Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes): for (int i = 0; i < cuboid.getColumns().size(); i++) { splitOffsets[i] = offset; TblColRef col = cuboid.getColumns().get(i); int colLength = colIO.getColumnLength(col); SplittedBytes split = this.splitBuffers[this.bufferSize++]; split.length = colLength; System.arraycopy(bytes, offset, split.value, 0, colLength);_ offset += colLength; } Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column value length is 256 in bytes and is being copied to a bytes array with length 255. The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); I think the better way is to always set the max split length as 256. And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. was: In class NDCuboidBuilder. _public NDCuboidBuilder(CubeSegment cubeSegment) { this.cubeSegment = cubeSegment; this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*; this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); }_ which will create a temp bytes array with length 256 to fill in rowkey column bytes. While, in class MergeCuboidMapper it's initialized with length 255. _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_ * So, if a dimension is encoded in fixed length and the length is 256. The cube building job will succeed. While, the merge job will always fail.* public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException { _ long cuboidID = rowKeySplitter.split(key.getBytes());_ Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); in method doMap, it will invoke RowKeySplitter.split(byte[] bytes): _for (int i = 0; i < cuboid.getColumns().size(); i++) { splitOffsets[i] = offset; TblColRef col = cuboid.getColumns().get(i); int colLength = colIO.getColumnLength(col); SplittedBytes split = this.splitBuffers[this.bufferSize++]; split.length = colLength; _ System.arraycopy(bytes, offset, split.value, 0, colLength);_ offset += colLength; }_ Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column length is 256 in bytes and is being copied to a bytes array with length 255. The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: _rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);_ I think the better way is to always set the max split length as 256. And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. > Incompatible RowKeySplitter initialize between build and merge job > -- > > Key: KYLIN-3115 > URL: https://issues.apache.org/jira/browse/KYLIN-3115 > Project: Kylin > Issue Type: Bug >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > In class NDCuboidBuilder. > public NDCuboidBuilder(CubeSegment cubeSegment) { > this.cubeSegment = cubeSegment; > this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*; >
[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
[ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-3115: -- Description: In class NDCuboidBuilder. public NDCuboidBuilder(CubeSegment cubeSegment) { this.cubeSegment = cubeSegment; this.rowKeySplitter = new RowKeySplitter(cubeSegment, 65, 256); this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); } which will create a temp bytes array with length 256 to fill in rowkey column bytes. While, in class MergeCuboidMapper it's initialized with length 255. rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255); So, if a dimension is encoded in fixed length and the max length is set to 256. The cube building job will succeed. While, the merge job will always fail. Since in class MergeCuboidMapper method doMap: public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException { long cuboidID = rowKeySplitter.split(key.getBytes()); Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes): for (int i = 0; i < cuboid.getColumns().size(); i++) { splitOffsets[i] = offset; TblColRef col = cuboid.getColumns().get(i); int colLength = colIO.getColumnLength(col); SplittedBytes split = this.splitBuffers[this.bufferSize++]; split.length = colLength; System.arraycopy(bytes, offset, split.value, 0, colLength);_ offset += colLength; } Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column value length is 256 in bytes and is being copied to a bytes array with length 255. The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); I think the better way is to always set the max split length as 256. And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. was: In class NDCuboidBuilder. public NDCuboidBuilder(CubeSegment cubeSegment) { this.cubeSegment = cubeSegment; this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*; this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); } which will create a temp bytes array with length 256 to fill in rowkey column bytes. While, in class MergeCuboidMapper it's initialized with length 255. rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255); So, if a dimension is encoded in fixed length and the max length is set to 256. The cube building job will succeed. While, the merge job will always fail. Since in class MergeCuboidMapper method doMap: public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException { long cuboidID = rowKeySplitter.split(key.getBytes()); Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); in method doMap, it will invoke method RowKeySplitter.split(byte[] bytes): for (int i = 0; i < cuboid.getColumns().size(); i++) { splitOffsets[i] = offset; TblColRef col = cuboid.getColumns().get(i); int colLength = colIO.getColumnLength(col); SplittedBytes split = this.splitBuffers[this.bufferSize++]; split.length = colLength; System.arraycopy(bytes, offset, split.value, 0, colLength);_ offset += colLength; } Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column value length is 256 in bytes and is being copied to a bytes array with length 255. The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); I think the better way is to always set the max split length as 256. And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. > Incompatible RowKeySplitter initialize between build and merge job > -- > > Key: KYLIN-3115 > URL: https://issues.apache.org/jira/browse/KYLIN-3115 > Project: Kylin > Issue Type: Bug >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > In class NDCuboidBuilder. > public NDCuboidBuilder(CubeSegment cubeSegment) { > this.cubeSegment = cubeSegment; > this.rowKeySpli
[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
[ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-3115: -- Description: In class NDCuboidBuilder. _public NDCuboidBuilder(CubeSegment cubeSegment) { this.cubeSegment = cubeSegment; this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*; this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); }_ which will create a temp bytes array with length 256 to fill in rowkey column bytes. While, in class MergeCuboidMapper it's initialized with length 255. _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_ * So, if a dimension is encoded in fixed length and the length is 256. The cube building job will succeed. While, the merge job will always fail.* public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException { _ long cuboidID = rowKeySplitter.split(key.getBytes());_ Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); in method doMap, it will invoke RowKeySplitter.split(byte[] bytes): _for (int i = 0; i < cuboid.getColumns().size(); i++) { splitOffsets[i] = offset; TblColRef col = cuboid.getColumns().get(i); int colLength = colIO.getColumnLength(col); SplittedBytes split = this.splitBuffers[this.bufferSize++]; split.length = colLength; _ System.arraycopy(bytes, offset, split.value, 0, colLength);_ offset += colLength; }_ Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column length is 256 in bytes and is being copied to a bytes array with length 255. The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: _rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255);_ I think the better way is to always set the max split length as 256. And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. was: In class NDCuboidBuilder. _public NDCuboidBuilder(CubeSegment cubeSegment) { this.cubeSegment = cubeSegment; this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*; this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); }_ which will create a temp bytes array with length 256 to fill in rowkey column bytes. While, in class MergeCuboidMapper it's initialized with length 255. _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_ So, if a dimension is encoded in fixed length and the length is 256. The cube building job will succeed. While, the merge job will always fail. public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException { _ long cuboidID = rowKeySplitter.split(key.getBytes());_ Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); in method doMap, it will invoke RowKeySplitter.split(byte[] bytes): _// rowkey columns for (int i = 0; i < cuboid.getColumns().size(); i++) { splitOffsets[i] = offset; TblColRef col = cuboid.getColumns().get(i); int colLength = colIO.getColumnLength(col); SplittedBytes split = this.splitBuffers[this.bufferSize++]; split.length = colLength; _ System.arraycopy(bytes, offset, split.value, 0, colLength);_ offset += colLength; }_ Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column length is 256 in bytes and is being copied to a bytes array with length 255. The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); I think the better way is to always set the max split length as 256. And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. > Incompatible RowKeySplitter initialize between build and merge job > -- > > Key: KYLIN-3115 > URL: https://issues.apache.org/jira/browse/KYLIN-3115 > Project: Kylin > Issue Type: Bug >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > In class NDCuboidBuilder. > _public NDCuboidBuilder(CubeSegment cubeSegment) { > this.cubeSegment = cubeSegment; > this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*; > this.rowKeyEncoderProvider = new Ro
[jira] [Updated] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
[ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-3115: -- Priority: Minor (was: Major) > Incompatible RowKeySplitter initialize between build and merge job > -- > > Key: KYLIN-3115 > URL: https://issues.apache.org/jira/browse/KYLIN-3115 > Project: Kylin > Issue Type: Bug >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > In class NDCuboidBuilder. > _public NDCuboidBuilder(CubeSegment cubeSegment) { > this.cubeSegment = cubeSegment; > this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*; > this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); > }_ > which will create a temp bytes array with length 256 to fill in rowkey column > bytes. > While, in class MergeCuboidMapper it's initialized with length 255. > _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_ > So, if a dimension is encoded in fixed length and the length is 256. The cube > building job will succeed. While, the merge job will always fail. > public void doMap(Text key, Text value, Context context) throws > IOException, InterruptedException { >_ long cuboidID = rowKeySplitter.split(key.getBytes());_ > Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); > in method doMap, it will invoke RowKeySplitter.split(byte[] bytes): > _// rowkey columns > for (int i = 0; i < cuboid.getColumns().size(); i++) { > splitOffsets[i] = offset; > TblColRef col = cuboid.getColumns().get(i); > int colLength = colIO.getColumnLength(col); > SplittedBytes split = this.splitBuffers[this.bufferSize++]; > split.length = colLength; >_ System.arraycopy(bytes, offset, split.value, 0, colLength);_ > offset += colLength; > }_ > Method System.arraycopy will result in IndexOutOfBoundsException exception, > if a column length is 256 in bytes and is being copied to a bytes array with > length 255. > The incompatibility is also occurred in class > FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: > rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); > I think the better way is to always set the max split length as 256. > And actually dimension encoded in fix length 256 is pretty common in our > production. Since in Hive, type varchar(256) is pretty common, users does > have not much knowledge will prefer to chose fix length encoding on such > dimensions, and set max length as 256. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
[ https://issues.apache.org/jira/browse/KYLIN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang reassigned KYLIN-3115: - Assignee: Wang, Gang > Incompatible RowKeySplitter initialize between build and merge job > -- > > Key: KYLIN-3115 > URL: https://issues.apache.org/jira/browse/KYLIN-3115 > Project: Kylin > Issue Type: Bug >Reporter: Wang, Gang >Assignee: Wang, Gang > > In class NDCuboidBuilder. > _public NDCuboidBuilder(CubeSegment cubeSegment) { > this.cubeSegment = cubeSegment; > this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*; > this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); > }_ > which will create a temp bytes array with length 256 to fill in rowkey column > bytes. > While, in class MergeCuboidMapper it's initialized with length 255. > _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_ > So, if a dimension is encoded in fixed length and the length is 256. The cube > building job will succeed. While, the merge job will always fail. > public void doMap(Text key, Text value, Context context) throws > IOException, InterruptedException { >_ long cuboidID = rowKeySplitter.split(key.getBytes());_ > Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); > in method doMap, it will invoke RowKeySplitter.split(byte[] bytes): > _// rowkey columns > for (int i = 0; i < cuboid.getColumns().size(); i++) { > splitOffsets[i] = offset; > TblColRef col = cuboid.getColumns().get(i); > int colLength = colIO.getColumnLength(col); > SplittedBytes split = this.splitBuffers[this.bufferSize++]; > split.length = colLength; >_ System.arraycopy(bytes, offset, split.value, 0, colLength);_ > offset += colLength; > }_ > Method System.arraycopy will result in IndexOutOfBoundsException exception, > if a column length is 256 in bytes and is being copied to a bytes array with > length 255. > The incompatibility is also occurred in class > FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: > rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); > I think the better way is to always set the max split length as 256. > And actually dimension encoded in fix length 256 is pretty common in our > production. Since in Hive, type varchar(256) is pretty common, users does > have not much knowledge will prefer to chose fix length encoding on such > dimensions, and set max length as 256. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-3115) Incompatible RowKeySplitter initialize between build and merge job
Wang, Gang created KYLIN-3115: - Summary: Incompatible RowKeySplitter initialize between build and merge job Key: KYLIN-3115 URL: https://issues.apache.org/jira/browse/KYLIN-3115 Project: Kylin Issue Type: Bug Reporter: Wang, Gang In class NDCuboidBuilder. _public NDCuboidBuilder(CubeSegment cubeSegment) { this.cubeSegment = cubeSegment; this.rowKeySplitter =* new RowKeySplitter(cubeSegment, 65, 256)*; this.rowKeyEncoderProvider = new RowKeyEncoderProvider(cubeSegment); }_ which will create a temp bytes array with length 256 to fill in rowkey column bytes. While, in class MergeCuboidMapper it's initialized with length 255. _rowKeySplitter = new RowKeySplitter(sourceCubeSegment, 65, 255);_ So, if a dimension is encoded in fixed length and the length is 256. The cube building job will succeed. While, the merge job will always fail. public void doMap(Text key, Text value, Context context) throws IOException, InterruptedException { _ long cuboidID = rowKeySplitter.split(key.getBytes());_ Cuboid cuboid = Cuboid.findForMandatory(cubeDesc, cuboidID); in method doMap, it will invoke RowKeySplitter.split(byte[] bytes): _// rowkey columns for (int i = 0; i < cuboid.getColumns().size(); i++) { splitOffsets[i] = offset; TblColRef col = cuboid.getColumns().get(i); int colLength = colIO.getColumnLength(col); SplittedBytes split = this.splitBuffers[this.bufferSize++]; split.length = colLength; _ System.arraycopy(bytes, offset, split.value, 0, colLength);_ offset += colLength; }_ Method System.arraycopy will result in IndexOutOfBoundsException exception, if a column length is 256 in bytes and is being copied to a bytes array with length 255. The incompatibility is also occurred in class FilterRecommendCuboidDataMapper, initialize RowkeySplitter as: rowKeySplitter = new RowKeySplitter(originalSegment, 65, 255); I think the better way is to always set the max split length as 256. And actually dimension encoded in fix length 256 is pretty common in our production. Since in Hive, type varchar(256) is pretty common, users does have not much knowledge will prefer to chose fix length encoding on such dimensions, and set max length as 256. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294449#comment-16294449 ] Wang, Gang edited comment on KYLIN-2956 at 12/18/17 2:16 AM: - I think when building trie dictionary, 32767 is too huge as the value length limit, 8191 should make sense. Fix as '0xE000'. was (Author: gwang3): I think when building trie dictionary, 32767 is too huge as the value length limit, 8191 should make length. Fix as '0xE000'. > building trie dictionary blocked on value of length over 4095 > -- > > Key: KYLIN-2956 > URL: https://issues.apache.org/jira/browse/KYLIN-2956 > Project: Kylin > Issue Type: Bug > Components: General >Reporter: Wang, Gang >Assignee: Wang, Gang > Attachments: > 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch > > > In the new release, Kylin will check the value length when building trie > dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through > method: > private void positiveShortPreCheck(int i, String fieldName) { > if (!BytesUtil.isPositiveShort(i)) { > throw new IllegalStateException(fieldName + " is not positive short, > usually caused by too long dict value."); > } > } > public static boolean isPositiveShort(int i) { > return (i & 0x7000) == 0; > } > And 0x7000 in binary: 0111 , so the > value length should be less than 0001 0001 , > values 4095 in decimalism. > I wonder why is 0x7000, should 0x8000 ( 1000 > ), support max length: 0111 > (32767) > be what you want? > Or 32767 may be too large, I prefer use 0xE000, 0xE000 ( > 1110 ), support max length: 0001 > (8191) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2956: -- Attachment: 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch I think when building trie dictionary, 32767 is too huge as the value length limit, 8191 should make length. Fix as '0xE000'. > building trie dictionary blocked on value of length over 4095 > -- > > Key: KYLIN-2956 > URL: https://issues.apache.org/jira/browse/KYLIN-2956 > Project: Kylin > Issue Type: Bug > Components: General >Reporter: Wang, Gang >Assignee: Wang, Gang > Attachments: > 0001-KYLIN-2956-building-trie-dictionary-blocked-on-value.patch > > > In the new release, Kylin will check the value length when building trie > dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through > method: > private void positiveShortPreCheck(int i, String fieldName) { > if (!BytesUtil.isPositiveShort(i)) { > throw new IllegalStateException(fieldName + " is not positive short, > usually caused by too long dict value."); > } > } > public static boolean isPositiveShort(int i) { > return (i & 0x7000) == 0; > } > And 0x7000 in binary: 0111 , so the > value length should be less than 0001 0001 , > values 4095 in decimalism. > I wonder why is 0x7000, should 0x8000 ( 1000 > ), support max length: 0111 > (32767) > be what you want? > Or 32767 may be too large, I prefer use 0xE000, 0xE000 ( > 1110 ), support max length: 0001 > (8191) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2903) support cardinality calculation for Hive view
[ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16272025#comment-16272025 ] Wang, Gang commented on KYLIN-2903: --- Thanks Shaofeng. Sorry for being busy in the past weeks. I will work out a patch in next one or two weeks. 发自我的iPhone -- Original -- From: Shaofeng SHI (JIRA) Date: 周三,11月 29,2017 9:34 上午 To: 405611081 <405611...@qq.com> Subject: Re: [jira] [Commented] (KYLIN-2903) support cardinality calculation forHive view [ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269891#comment-16269891 ] Shaofeng SHI commented on KYLIN-2903: - Hi Wang Gang, yes this is a known issue in Kylin. Would you like to contribute a patch for this? Thanks for making Kylin better! -- This message was sent by Atlassian JIRA (v6.4.14#64029) > support cardinality calculation for Hive view > - > > Key: KYLIN-2903 > URL: https://issues.apache.org/jira/browse/KYLIN-2903 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > Currently, Kylin leverage HCatlog to calculate column cardinality for Hive > tables. While, HCatlog does not support Hive view actually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang reassigned KYLIN-2956: - Assignee: Wang, Gang (was: Shaofeng SHI) > building trie dictionary blocked on value of length over 4095 > -- > > Key: KYLIN-2956 > URL: https://issues.apache.org/jira/browse/KYLIN-2956 > Project: Kylin > Issue Type: Bug > Components: General >Reporter: Wang, Gang >Assignee: Wang, Gang > > In the new release, Kylin will check the value length when building trie > dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through > method: > private void positiveShortPreCheck(int i, String fieldName) { > if (!BytesUtil.isPositiveShort(i)) { > throw new IllegalStateException(fieldName + " is not positive short, > usually caused by too long dict value."); > } > } > public static boolean isPositiveShort(int i) { > return (i & 0x7000) == 0; > } > And 0x7000 in binary: 0111 , so the > value length should be less than 0001 0001 , > values 4095 in decimalism. > I wonder why is 0x7000, should 0x8000 ( 1000 > ), support max length: 0111 > (32767) > be what you want? > Or 32767 may be too large, I prefer use 0xE000, 0xE000 ( > 1110 ), support max length: 0001 > (8191) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2956: -- Description: In the new release, Kylin will check the value length when building trie dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through method: private void positiveShortPreCheck(int i, String fieldName) { if (!BytesUtil.isPositiveShort(i)) { throw new IllegalStateException(fieldName + " is not positive short, usually caused by too long dict value."); } } public static boolean isPositiveShort(int i) { return (i & 0x7000) == 0; } And 0x7000 in binary: 0111 , so the value length should be less than 0001 0001 , values 4095 in decimalism. I wonder why is 0x7000, should 0x8000 ( 1000 ), support max length: 0111 (32767) be what you want? Or 32767 may be too large, I prefer use 0xE000, 0xE000 ( 1110 ), support max length: 0001 (8191) was: In the new release, Kylin will check the value length when building trie dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through method: private void positiveShortPreCheck(int i, String fieldName) { if (!BytesUtil.isPositiveShort(i)) { throw new IllegalStateException(fieldName + " is not positive short, usually caused by too long dict value."); } } public static boolean isPositiveShort(int i) { return (i & 0x7000) == 0; } And 0x7000 in binary: 0111 , so the value length should be less than 0001 0001 , values 4095 in decimalism. I wonder why is 0x7000, should 0x8000: 1000 support max length: 0111 (32767) be what you want? And 32767 may be too lagrge, I prefer use 0xE000, 0xE000: 1110 , support max length: 0001 (8191) > building trie dictionary blocked on value of length over 4095 > -- > > Key: KYLIN-2956 > URL: https://issues.apache.org/jira/browse/KYLIN-2956 > Project: Kylin > Issue Type: Bug > Components: General >Reporter: Wang, Gang >Assignee: Shaofeng SHI > > In the new release, Kylin will check the value length when building trie > dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through > method: > private void positiveShortPreCheck(int i, String fieldName) { > if (!BytesUtil.isPositiveShort(i)) { > throw new IllegalStateException(fieldName + " is not positive short, > usually caused by too long dict value."); > } > } > public static boolean isPositiveShort(int i) { > return (i & 0x7000) == 0; > } > And 0x7000 in binary: 0111 , so the > value length should be less than 0001 0001 , > values 4095 in decimalism. > I wonder why is 0x7000, should 0x8000 ( 1000 > ), support max length: 0111 > (32767) > be what you want? > Or 32767 may be too large, I prefer use 0xE000, 0xE000 ( > 1110 ), support max length: 0001 > (8191) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang reassigned KYLIN-2956: - Assignee: Shaofeng SHI (was: Wang, Gang) > building trie dictionary blocked on value of length over 4095 > -- > > Key: KYLIN-2956 > URL: https://issues.apache.org/jira/browse/KYLIN-2956 > Project: Kylin > Issue Type: Bug > Components: General >Reporter: Wang, Gang >Assignee: Shaofeng SHI > > In the new release, Kylin will check the value length when building trie > dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through > method: > private void positiveShortPreCheck(int i, String fieldName) { > if (!BytesUtil.isPositiveShort(i)) { > throw new IllegalStateException(fieldName + " is not positive short, > usually caused by too long dict value."); > } > } > public static boolean isPositiveShort(int i) { > return (i & 0x7000) == 0; > } > And 0x7000 in binary: 0111 , so the > value length should be less than 0001 0001 , > values 4095 in decimalism. > I wonder why is 0x7000, should > 0x8000: 1000 > support max length: 0111 (32767) > be what you want? And 32767 may be too lagrge, I prefer use 0xE000, > 0xE000: 1110 , > support max length: 0001 (8191) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2956: -- Description: In the new release, Kylin will check the value length when building trie dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through method: private void positiveShortPreCheck(int i, String fieldName) { if (!BytesUtil.isPositiveShort(i)) { throw new IllegalStateException(fieldName + " is not positive short, usually caused by too long dict value."); } } public static boolean isPositiveShort(int i) { return (i & 0x7000) == 0; } And 0x7000 in binary: 0111 , so the value length should be less than 0001 0001 , values 4095 in decimalism. I wonder why is 0x7000, should 0x8000: 1000 support max length: 0111 (32767) be what you want? And 32767 may be too lagrge, I prefer use 0xE000, 0xE000: 1110 , support max length: 0001 (8191) was: In the new release, Kylin will check the value length when building trie dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through method: _private void positiveShortPreCheck(int i, String fieldName) { if (!BytesUtil.isPositiveShort(i)) { throw new IllegalStateException(fieldName + " is not positive short, usually caused by too long dict value."); } }_ _public static boolean isPositiveShort(int i) { return (i & 0x7000) == 0; } _ And 0x7000 in binary: 0111 , so the value length should be less than 0001 0001 , values 4095 in decimalism. I wonder why is 0x7000, should 0x8000: 1000 support max length: 0111 (32767) be what you want? And 32767 may be too lagrge, I prefer use 0xE000, 0xE000: 1110 , support max length: 0001 (8191) > building trie dictionary blocked on value of length over 4095 > -- > > Key: KYLIN-2956 > URL: https://issues.apache.org/jira/browse/KYLIN-2956 > Project: Kylin > Issue Type: Bug > Components: General >Reporter: Wang, Gang >Assignee: Wang, Gang > > In the new release, Kylin will check the value length when building trie > dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through > method: > private void positiveShortPreCheck(int i, String fieldName) { > if (!BytesUtil.isPositiveShort(i)) { > throw new IllegalStateException(fieldName + " is not positive short, > usually caused by too long dict value."); > } > } > public static boolean isPositiveShort(int i) { > return (i & 0x7000) == 0; > } > And 0x7000 in binary: 0111 , so the > value length should be less than 0001 0001 , > values 4095 in decimalism. > I wonder why is 0x7000, should > 0x8000: 1000 > support max length: 0111 (32767) > be what you want? And 32767 may be too lagrge, I prefer use 0xE000, > 0xE000: 1110 , > support max length: 0001 (8191) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2956: -- Description: In the new release, Kylin will check the value length when building trie dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through method: _private void positiveShortPreCheck(int i, String fieldName) { if (!BytesUtil.isPositiveShort(i)) { throw new IllegalStateException(fieldName + " is not positive short, usually caused by too long dict value."); } }_ _public static boolean isPositiveShort(int i) { return (i & 0x7000) == 0; } _ And 0x7000 in binary: 0111 , so the value length should be less than 0001 0001 , values 4095 in decimalism. I wonder why is 0x7000, should 0x8000: 1000 support max length: 0111 (32767) be what you want? And 32767 may be too lagrge, I prefer use 0xE000, 0xE000: 1110 , support max length: 0001 (8191) was: In the new release, Kylin will check the value length when building trie dictionary, in class _TrieDictionaryBuilder_ method _buildTrieBytes_ , through method: _private void positiveShortPreCheck(int i, String fieldName) { if (!BytesUtil.isPositiveShort(i)) { throw new IllegalStateException(fieldName + " is not positive short, usually caused by too long dict value."); } } _ _public static boolean isPositiveShort(int i) { return (i & 0x7000) == 0; }_ And 0x7000 in binary: 0111 , so the value length should be less than 0001 0001 , values 4095 in decimalism. I wonder why is 0x7000, should 0x8000: 1000 support max length: 0111 (32767) be what you want? And 32767 may be too lagrge, I prefer use 0xE000, 0xE000: 1110 , support max length: 0001 (8191) > building trie dictionary blocked on value of length over 4095 > -- > > Key: KYLIN-2956 > URL: https://issues.apache.org/jira/browse/KYLIN-2956 > Project: Kylin > Issue Type: Bug > Components: General >Reporter: Wang, Gang >Assignee: Wang, Gang > > In the new release, Kylin will check the value length when building trie > dictionary, in class TrieDictionaryBuilder method buildTrieBytes, through > method: > _private void positiveShortPreCheck(int i, String fieldName) { > if (!BytesUtil.isPositiveShort(i)) { > throw new IllegalStateException(fieldName + " is not positive short, > usually caused by too long dict value."); > } > }_ > _public static boolean isPositiveShort(int i) { > return (i & 0x7000) == 0; > } > _ > And 0x7000 in binary: 0111 , so the > value length should be less than 0001 0001 , > values 4095 in decimalism. > I wonder why is 0x7000, should > 0x8000: 1000 > support max length: 0111 (32767) > be what you want? And 32767 may be too lagrge, I prefer use 0xE000, > 0xE000: 1110 , > support max length: 0001 (8191) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
[ https://issues.apache.org/jira/browse/KYLIN-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang reassigned KYLIN-2956: - Assignee: Wang, Gang > building trie dictionary blocked on value of length over 4095 > -- > > Key: KYLIN-2956 > URL: https://issues.apache.org/jira/browse/KYLIN-2956 > Project: Kylin > Issue Type: Bug > Components: General >Reporter: Wang, Gang >Assignee: Wang, Gang > > In the new release, Kylin will check the value length when building trie > dictionary, in class _TrieDictionaryBuilder_ method _buildTrieBytes_ , > through method: > _private void positiveShortPreCheck(int i, String fieldName) { > if (!BytesUtil.isPositiveShort(i)) { > throw new IllegalStateException(fieldName + " is not positive short, > usually caused by too long dict value."); > } > } _ > _public static boolean isPositiveShort(int i) { > return (i & 0x7000) == 0; > }_ > And 0x7000 in binary: 0111 , so the > value length should be less than 0001 0001 , > values 4095 in decimalism. > I wonder why is 0x7000, should > 0x8000: 1000 > support max length: 0111 (32767) > be what you want? And 32767 may be too lagrge, I prefer use 0xE000, > 0xE000: 1110 , > support max length: 0001 (8191) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2903) support cardinality calculation for Hive view
[ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214668#comment-16214668 ] Wang, Gang commented on KYLIN-2903: --- Will use HQL statement: count(distinct _column_ ) to calculate _column_ cardinality. > support cardinality calculation for Hive view > - > > Key: KYLIN-2903 > URL: https://issues.apache.org/jira/browse/KYLIN-2903 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > Currently, Kylin leverage HCatlog to calculate column cardinality for Hive > tables. While, HCatlog does not support Hive view actually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2956) building trie dictionary blocked on value of length over 4095
Wang, Gang created KYLIN-2956: - Summary: building trie dictionary blocked on value of length over 4095 Key: KYLIN-2956 URL: https://issues.apache.org/jira/browse/KYLIN-2956 Project: Kylin Issue Type: Bug Components: General Reporter: Wang, Gang In the new release, Kylin will check the value length when building trie dictionary, in class _TrieDictionaryBuilder_ method _buildTrieBytes_ , through method: _private void positiveShortPreCheck(int i, String fieldName) { if (!BytesUtil.isPositiveShort(i)) { throw new IllegalStateException(fieldName + " is not positive short, usually caused by too long dict value."); } } _ _public static boolean isPositiveShort(int i) { return (i & 0x7000) == 0; }_ And 0x7000 in binary: 0111 , so the value length should be less than 0001 0001 , values 4095 in decimalism. I wonder why is 0x7000, should 0x8000: 1000 support max length: 0111 (32767) be what you want? And 32767 may be too lagrge, I prefer use 0xE000, 0xE000: 1110 , support max length: 0001 (8191) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2913) Enable job retry for configurable exceptions
Wang, Gang created KYLIN-2913: - Summary: Enable job retry for configurable exceptions Key: KYLIN-2913 URL: https://issues.apache.org/jira/browse/KYLIN-2913 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v2.1.0 Reporter: Wang, Gang Assignee: Dong Li Fix For: v2.2.0 In our production environment, we always get some certain exceptions from Hadoop or HBase, like "org.apache.kylin.job.exception.NoEnoughReplicationException", "java.util.ConcurrentModificationException", which results in job failure. While, these exceptions can be handled by retry actually. So, it will be much more convenient if we are able to make job retry on some configurable exceptions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KYLIN-2903) support cardinality calculation for Hive view
[ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang, Gang updated KYLIN-2903: -- Description: Currently, Kylin leverage HCatlog to calculate column cardinality for Hive tables. While, HCatlog does not support Hive view actually. (was: Currently, Kylin leverage HCatlog to calculate column cardinality for Hive tables. While, HCatlog does not support Hive view actually.) > support cardinality calculation for Hive view > - > > Key: KYLIN-2903 > URL: https://issues.apache.org/jira/browse/KYLIN-2903 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang, Gang >Assignee: Zhong Yanghong >Priority: Minor > > Currently, Kylin leverage HCatlog to calculate column cardinality for Hive > tables. While, HCatlog does not support Hive view actually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2903) support cardinality calculation for Hive view
Wang, Gang created KYLIN-2903: - Summary: support cardinality calculation for Hive view Key: KYLIN-2903 URL: https://issues.apache.org/jira/browse/KYLIN-2903 Project: Kylin Issue Type: Improvement Components: Job Engine Reporter: Wang, Gang Assignee: Dong Li Priority: Minor Currently, Kylin leverage HCatlog to calculate column cardinality for Hive tables. While, HCatlog does not support Hive view actually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)