[jira] [Updated] (KYLIN-5017) Support project level mapreduce queue config

2021-06-27 Thread vergilchiu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vergilchiu updated KYLIN-5017:
--
Description: 
We have a lot of cubes in our compay , and for that we categorize cubes into 
diffent project based on user groups .

We want this cubes to use diffent mapreduce queue . Although cube-level 
mapreduce queue config can meet our demands , we need to set up mapreduce queue 
for each cube. That's a lot of work.

Now  weed need a project-level mapreduce config , cube can use different queue 
for different project.

  was:
We have a lot of cubes in our compay , and for that we categorize cubes into 
diffent project based on user groups .

We want this cubes to use diffent mapreduce queue . Although cube-level 
mapreduce queue config can meet our demands , we need to set up mapreduce queue 
for each cube. That's a lot of work.

Now  weed need a project-level mapreduce config , cube can different queue for 
different project.


> Support project level mapreduce queue config
> 
>
> Key: KYLIN-5017
> URL: https://issues.apache.org/jira/browse/KYLIN-5017
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v3.1.2
>Reporter: vergilchiu
>Priority: Major
>
> We have a lot of cubes in our compay , and for that we categorize cubes into 
> diffent project based on user groups .
> We want this cubes to use diffent mapreduce queue . Although cube-level 
> mapreduce queue config can meet our demands , we need to set up mapreduce 
> queue for each cube. That's a lot of work.
> Now  weed need a project-level mapreduce config , cube can use different 
> queue for different project.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-5017) Support project level mapreduce queue config

2021-06-27 Thread vergilchiu (Jira)
vergilchiu created KYLIN-5017:
-

 Summary: Support project level mapreduce queue config
 Key: KYLIN-5017
 URL: https://issues.apache.org/jira/browse/KYLIN-5017
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v3.1.2
Reporter: vergilchiu


We have a lot of cubes in our compay , and for that we categorize cubes into 
diffent project based on user groups .

We want this cubes to use diffent mapreduce queue . Although cube-level 
mapreduce queue config can meet our demands , we need to set up mapreduce queue 
for each cube. That's a lot of work.

Now  weed need a project-level mapreduce config , cube can different queue for 
different project.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4985) optimize kylin planner by delete unnecessary cuboids

2021-06-27 Thread tianhui (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370399#comment-17370399
 ] 

tianhui commented on KYLIN-4985:


Hi [~yaho],  I'm sure *kylin.cube.cubeplanner.expansion-threshold* can control 
redundancy, but is it better to just delete unhitted cuboids in variable space? 
 Why Kylin should prefer to build parent cuboid rather than the hitted cuboid 
itself?

As for weighting change, It's just a hueristic change to punish the cuboid that 
hitted less. Because it looks like the *hitProbability* in CuboidBenefitModel 
has never been used, I think it's better to consider hit probablility in 
cuboids recommand.

 

> optimize kylin planner by delete unnecessary cuboids
> 
>
> Key: KYLIN-4985
> URL: https://issues.apache.org/jira/browse/KYLIN-4985
> Project: Kylin
>  Issue Type: New Feature
>Reporter: tianhui
>Priority: Major
>
> When I use Kylin Planner, I can get many cuboids in recommand result that 
> never hitted by my history queries. I think it maybe unnecessary, so I delete 
> the unhitted cuboids.
> In addition, I change row count by weighting of 1/sqrt(hit probability) 
> before execute plan algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-5007) queries with limit clause may fail when string dimension is encoded in integer type

2021-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370219#comment-17370219
 ] 

ASF subversion and git services commented on KYLIN-5007:


Commit 7d1682a9359d2dc0c1292b6a385abd80c559c4d4 in kylin's branch 
refs/heads/master from tianhui5
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=7d1682a ]

KYLIN-5007 queries with limit clause may fail when string dimension i… (#1664)

* KYLIN-5007 queries with limit clause may fail when string dimension is 
encoded in integer type

* add KYLIN-4942 fix

Co-authored-by: tianhui5 

> queries with limit clause may fail when string dimension is encoded in 
> integer type
> ---
>
> Key: KYLIN-5007
> URL: https://issues.apache.org/jira/browse/KYLIN-5007
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v3.0.2
>Reporter: Congling Xia
>Assignee: Congling Xia
>Priority: Major
> Attachments: image-2021-06-10-10-03-54-775.png
>
>
> Hi, team.
> Recently we encounter a problem that queries may fail if there is a LIMIT in 
> the SQL. The SQL looks like:
> {code}
> select gid from some_table group by gid limit 100
> {code}
> The error message is like the following:
> {code:java}
> Not sorted! last: source_v1=null,...,gid=276,... fetched: 
> source_v1=null,...,gid=100506,...
> {code}
> After searching the issues list, we find it is similar with KYLIN-2425, 
> KYLIN-3089, and KYLIN-4942. We notice that these problems are not completely 
> resolved.
> It is an row-key encoding problem, the cube uses integer:4 to encode string 
> column _gid_:
> !image-2021-06-10-10-03-54-775.png|width=571,height=141!
> As [~kangkaisen] mensioned in KYLIN-3089, comparator in 
> SortMergedPartitionResultIterator is different from the one in 
> SortedIteratorMergerWithLimit. SortedIteratorMergerWithLimit compares tuple 
> of dimensions in their origin data type "string" rather than the encoded data 
> type "integer" in rowkeys. In the exception message above, 276<100506 is 
> false because they are compared in "string" type.
> It may be resolved by skipping limit pushdown when column type and encoding 
> type may produce different comparing results, but it may lead such queries to 
> be slower.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4942) dimension encoding boolean, query return Not sorted! last: exception

2021-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370220#comment-17370220
 ] 

ASF subversion and git services commented on KYLIN-4942:


Commit 7d1682a9359d2dc0c1292b6a385abd80c559c4d4 in kylin's branch 
refs/heads/master from tianhui5
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=7d1682a ]

KYLIN-5007 queries with limit clause may fail when string dimension i… (#1664)

* KYLIN-5007 queries with limit clause may fail when string dimension is 
encoded in integer type

* add KYLIN-4942 fix

Co-authored-by: tianhui5 

> dimension encoding boolean, query return Not sorted! last: exception
> 
>
> Key: KYLIN-4942
> URL: https://issues.apache.org/jira/browse/KYLIN-4942
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v3.0.2, v3.1.1
>Reporter: xue lin
>Assignee: Congling Xia
>Priority: Major
>
> i set one dimension(fta_flag) encoding as boolean, but when i excute sql as 
> below, it return Not sorted! last: exception.
> select 
>  fta_flag fta_flag
> from fact_subscriber fs
> where etl_dt = '20210319'
> group by fta_flag
>  
> when i disable limit checkbox, it works
> when i add order by fta_flag into sql, it works
> issue 3089 may be related with my problem 
> https://issues.apache.org/jira/browse/KYLIN-3089



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-5007) queries with limit clause may fail when string dimension is encoded in integer type

2021-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370218#comment-17370218
 ] 

ASF subversion and git services commented on KYLIN-5007:


Commit 7d1682a9359d2dc0c1292b6a385abd80c559c4d4 in kylin's branch 
refs/heads/master from tianhui5
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=7d1682a ]

KYLIN-5007 queries with limit clause may fail when string dimension i… (#1664)

* KYLIN-5007 queries with limit clause may fail when string dimension is 
encoded in integer type

* add KYLIN-4942 fix

Co-authored-by: tianhui5 

> queries with limit clause may fail when string dimension is encoded in 
> integer type
> ---
>
> Key: KYLIN-5007
> URL: https://issues.apache.org/jira/browse/KYLIN-5007
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v3.0.2
>Reporter: Congling Xia
>Assignee: Congling Xia
>Priority: Major
> Attachments: image-2021-06-10-10-03-54-775.png
>
>
> Hi, team.
> Recently we encounter a problem that queries may fail if there is a LIMIT in 
> the SQL. The SQL looks like:
> {code}
> select gid from some_table group by gid limit 100
> {code}
> The error message is like the following:
> {code:java}
> Not sorted! last: source_v1=null,...,gid=276,... fetched: 
> source_v1=null,...,gid=100506,...
> {code}
> After searching the issues list, we find it is similar with KYLIN-2425, 
> KYLIN-3089, and KYLIN-4942. We notice that these problems are not completely 
> resolved.
> It is an row-key encoding problem, the cube uses integer:4 to encode string 
> column _gid_:
> !image-2021-06-10-10-03-54-775.png|width=571,height=141!
> As [~kangkaisen] mensioned in KYLIN-3089, comparator in 
> SortMergedPartitionResultIterator is different from the one in 
> SortedIteratorMergerWithLimit. SortedIteratorMergerWithLimit compares tuple 
> of dimensions in their origin data type "string" rather than the encoded data 
> type "integer" in rowkeys. In the exception message above, 276<100506 is 
> false because they are compared in "string" type.
> It may be resolved by skipping limit pushdown when column type and encoding 
> type may produce different comparing results, but it may lead such queries to 
> be slower.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-5007) queries with limit clause may fail when string dimension is encoded in integer type

2021-06-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370217#comment-17370217
 ] 

ASF GitHub Bot commented on KYLIN-5007:
---

hit-lacus merged pull request #1664:
URL: https://github.com/apache/kylin/pull/1664


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@kylin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> queries with limit clause may fail when string dimension is encoded in 
> integer type
> ---
>
> Key: KYLIN-5007
> URL: https://issues.apache.org/jira/browse/KYLIN-5007
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v3.0.2
>Reporter: Congling Xia
>Assignee: Congling Xia
>Priority: Major
> Attachments: image-2021-06-10-10-03-54-775.png
>
>
> Hi, team.
> Recently we encounter a problem that queries may fail if there is a LIMIT in 
> the SQL. The SQL looks like:
> {code}
> select gid from some_table group by gid limit 100
> {code}
> The error message is like the following:
> {code:java}
> Not sorted! last: source_v1=null,...,gid=276,... fetched: 
> source_v1=null,...,gid=100506,...
> {code}
> After searching the issues list, we find it is similar with KYLIN-2425, 
> KYLIN-3089, and KYLIN-4942. We notice that these problems are not completely 
> resolved.
> It is an row-key encoding problem, the cube uses integer:4 to encode string 
> column _gid_:
> !image-2021-06-10-10-03-54-775.png|width=571,height=141!
> As [~kangkaisen] mensioned in KYLIN-3089, comparator in 
> SortMergedPartitionResultIterator is different from the one in 
> SortedIteratorMergerWithLimit. SortedIteratorMergerWithLimit compares tuple 
> of dimensions in their origin data type "string" rather than the encoded data 
> type "integer" in rowkeys. In the exception message above, 276<100506 is 
> false because they are compared in "string" type.
> It may be resolved by skipping limit pushdown when column type and encoding 
> type may produce different comparing results, but it may lead such queries to 
> be slower.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1664: KYLIN-5007 queries with limit clause may fail when string dimension i…

2021-06-27 Thread GitBox


hit-lacus merged pull request #1664:
URL: https://github.com/apache/kylin/pull/1664


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@kylin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-5011) Detect and scatter skewed data in dict encoding step

2021-06-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370215#comment-17370215
 ] 

ASF GitHub Bot commented on KYLIN-5011:
---

hit-lacus merged pull request #1662:
URL: https://github.com/apache/kylin/pull/1662


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@kylin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Detect and scatter skewed data in dict encoding step
> 
>
> Key: KYLIN-5011
> URL: https://issues.apache.org/jira/browse/KYLIN-5011
> Project: Kylin
>  Issue Type: New Feature
>  Components: Job Engine
>Affects Versions: v4.0.0-beta
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0
>
> Attachments: image-2021-06-15-10-54-19-419.png
>
>
> In KYLIN4, dictionaries are hashed into several buckets, column data are 
> repartitioned to the same partition size as bucket size. Then, each encoding 
> task is able to load a piece of  dictionary bucket to accelerate the encoding 
> step. 
> Recently we are troubled by this improvement when data skew happens. In some 
> of our cases, the repartition step during encoding is even impossible to 
> finish . Whereas this works fine in KYLIN3, for each Spark task will load all 
> dictionary of a column and encode column values to int values. There is no 
> need to do repartition step in KYLIN3.
> We solve this by:
>  # sample from source data and detect skewed data
>  # build skewed data's dictionary
>  # customize an repartition function to scatter skewed data to random 
> partitions
>  # do encoding with both skewed dictionary and dictionary loaded within each 
> partition
> After this improvement, some of our cube's build time reduced from 190min to 
> 30min
> !image-2021-06-15-10-54-19-419.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-5011) Detect and scatter skewed data in dict encoding step

2021-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370216#comment-17370216
 ] 

ASF subversion and git services commented on KYLIN-5011:


Commit 914b97f5cf2347030525140038d060178b93f955 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from zhengshengjun
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=914b97f ]

KYLIN-5011 Detect and scatter skewed data in dict encoding step (#1662)

Co-authored-by: Xiaoxiang Yu 

> Detect and scatter skewed data in dict encoding step
> 
>
> Key: KYLIN-5011
> URL: https://issues.apache.org/jira/browse/KYLIN-5011
> Project: Kylin
>  Issue Type: New Feature
>  Components: Job Engine
>Affects Versions: v4.0.0-beta
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0
>
> Attachments: image-2021-06-15-10-54-19-419.png
>
>
> In KYLIN4, dictionaries are hashed into several buckets, column data are 
> repartitioned to the same partition size as bucket size. Then, each encoding 
> task is able to load a piece of  dictionary bucket to accelerate the encoding 
> step. 
> Recently we are troubled by this improvement when data skew happens. In some 
> of our cases, the repartition step during encoding is even impossible to 
> finish . Whereas this works fine in KYLIN3, for each Spark task will load all 
> dictionary of a column and encode column values to int values. There is no 
> need to do repartition step in KYLIN3.
> We solve this by:
>  # sample from source data and detect skewed data
>  # build skewed data's dictionary
>  # customize an repartition function to scatter skewed data to random 
> partitions
>  # do encoding with both skewed dictionary and dictionary loaded within each 
> partition
> After this improvement, some of our cube's build time reduced from 190min to 
> 30min
> !image-2021-06-15-10-54-19-419.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1662: KYLIN-5011 Detect and scatter skewed data in dict encoding step

2021-06-27 Thread GitBox


hit-lacus merged pull request #1662:
URL: https://github.com/apache/kylin/pull/1662


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@kylin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (KYLIN-4985) optimize kylin planner by delete unnecessary cuboids

2021-06-27 Thread Zhong Yanghong (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370190#comment-17370190
 ] 

Zhong Yanghong edited comment on KYLIN-4985 at 6/27/21, 10:37 AM:
--

Hi [~tianhui5], the cuboids recommended by cube planner algorithms is 
redundant. The redundancy is controlled by 
*kylin.cube.cubeplanner.expansion-threshold*.

One more thing for "get many cuboids in recommand result that never hitted by 
my history queries". If cuboid A is the parent of cuboid B, and their row 
account are similar, even when your history queries always hit cuboid B, Kylin 
should prefer to choose cuboid A to be built.

For the weighting change, could you explain more about the mathematical theory? 
At first glance, it's not follow monotonicity of the probability.


was (Author: yaho):
Hi [~tianhui5], the cuboids recommended by cube planner algorithms is 
redundant. The redundancy is controlled by 
*kylin.cube.cubeplanner.expansion-threshold*.

One more thing for "get many cuboids in recommand result that never hitted by 
my history queries". If cuboid A is the parent of cuboid B, and their row 
account are similar, even when your history queries always hit cuboid B, Kylin 
should prefer to choose cuboid A to be built.

For the weighting change, could you explain more about the mathematical theory? 
At first glance, it's not follow monotonicity.

> optimize kylin planner by delete unnecessary cuboids
> 
>
> Key: KYLIN-4985
> URL: https://issues.apache.org/jira/browse/KYLIN-4985
> Project: Kylin
>  Issue Type: New Feature
>Reporter: tianhui
>Priority: Major
>
> When I use Kylin Planner, I can get many cuboids in recommand result that 
> never hitted by my history queries. I think it maybe unnecessary, so I delete 
> the unhitted cuboids.
> In addition, I change row count by weighting of 1/sqrt(hit probability) 
> before execute plan algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4985) optimize kylin planner by delete unnecessary cuboids

2021-06-27 Thread Zhong Yanghong (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370190#comment-17370190
 ] 

Zhong Yanghong commented on KYLIN-4985:
---

Hi [~tianhui5], the cuboids recommended by cube planner algorithms is 
redundant. The redundancy is controlled by 
*kylin.cube.cubeplanner.expansion-threshold*.

One more thing for "get many cuboids in recommand result that never hitted by 
my history queries". If cuboid A is the parent of cuboid B, and their row 
account are similar, even when your history queries always hit cuboid B, Kylin 
should prefer to choose cuboid A to be built.

For the weighting change, could you explain more about the mathematical theory? 
At first glance, it's not follow monotonicity.

> optimize kylin planner by delete unnecessary cuboids
> 
>
> Key: KYLIN-4985
> URL: https://issues.apache.org/jira/browse/KYLIN-4985
> Project: Kylin
>  Issue Type: New Feature
>Reporter: tianhui
>Priority: Major
>
> When I use Kylin Planner, I can get many cuboids in recommand result that 
> never hitted by my history queries. I think it maybe unnecessary, so I delete 
> the unhitted cuboids.
> In addition, I change row count by weighting of 1/sqrt(hit probability) 
> before execute plan algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KYLIN-4165) RT OLAP building job on "Save Cube Dictionaries" step concurrency error

2021-06-27 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu reassigned KYLIN-4165:
---

Assignee: wangxiaojing

> RT OLAP building job on "Save Cube Dictionaries" step concurrency error
> ---
>
> Key: KYLIN-4165
> URL: https://issues.apache.org/jira/browse/KYLIN-4165
> Project: Kylin
>  Issue Type: Bug
>  Components: Real-time Streaming
>Affects Versions: v3.0.0-alpha
>Reporter: wangxiaojing
>Assignee: wangxiaojing
>Priority: Major
> Fix For: v3.0.0
>
>
> There is a dictionary version conflict in "Save Cube Dictionaries" step when 
> build the realtime fsegment from remote persisted to reday,Which is very 
> serious,it will lead to unsuccessful updating of dictionaries by multiple 
> jobs concurrently.This may occurs when a cube has many concurrent building 
> jobs one the same step ——”Save Cube Dictionaries“ . 
> Perhaps a globally distributed lock is needed to avoid one cube concurrency 
> running of this step .
> Save Cube Dictionaries log messages:
> {code:java}
> // code placeholder
> org.apache.kylin.common.persistence.WriteConflictException: Overwriting 
> conflict 
> /dict/DEFAULT.TASK_SNAPSHOT/GROUPVALUE/5387e747-9649-0b17-5a72-ee17f5baea0a.dict,
>  expect old TS 1568012509090, but it is 1568012509245at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.updateTimestampImpl(HBaseResourceStore.java:372)
> at 
> org.apache.kylin.common.persistence.ResourceStore$7.call(ResourceStore.java:465)
> at 
> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
> at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampWithRetry(ResourceStore.java:462)
> at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestampCheckPoint(ResourceStore.java:457)
> at 
> org.apache.kylin.common.persistence.ResourceStore.updateTimestamp(ResourceStore.java:452)
> at 
> org.apache.kylin.dict.DictionaryManager.updateExistingDictLastModifiedTime(DictionaryManager.java:197)
> at 
> org.apache.kylin.dict.DictionaryManager.trySaveNewDict(DictionaryManager.java:157)
> at 
> org.apache.kylin.engine.mr.streaming.SaveDictStep.doWork(SaveDictStep.java:122)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at 
> org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KYLIN-5016) Avoid potential NPE issue in RDBMS Pushdown case

2021-06-27 Thread Xiaoxiang Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu resolved KYLIN-5016.
-
Fix Version/s: v3.1.3
   Resolution: Fixed

> Avoid potential NPE issue in RDBMS Pushdown case
> 
>
> Key: KYLIN-5016
> URL: https://issues.apache.org/jira/browse/KYLIN-5016
> Project: Kylin
>  Issue Type: Improvement
>  Components: RDBMS Source
>Affects Versions: v2.6.5, v3.1.2
>Reporter: rongchuan.jin
>Assignee: rongchuan.jin
>Priority: Minor
> Fix For: v3.1.3
>
>
> When I use pushdown with RDBMS source, I encounter some error when convert 
> sql which leads to pushdown failing.
> I find below stacktrace like
> {code:java}
> 2020-12-23 13:14:10,212 ERROR [Query a1bf28bb-de28-433e-ab96-28ce234a1a4a-76] 
> conv.SqlConverter : Failed to default convert sql, will use the origin input: 
> select 1 from `MOVIES_10M`.`DIM_MOVIES_10M` 2LIMIT 500 
> 3java.lang.NullPointerException 4 at 
> org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:103) 5 at 
> org.apache.calcite.sql.pretty.SqlPrettyWriter.format(SqlPrettyWriter.java:806)
>  6 at 
> org.apache.kylin.sdk.datasource.framework.conv.SqlConverter.convertSql(SqlConverter.java:69)
>  7 at 
> org.apache.kylin.sdk.datasource.framework.JdbcConnector.convertSql(JdbcConnector.java:91)
>  8 at 
> org.apache.kylin.sdk.datasource.PushDownRunnerSDKImpl.executeQuery(PushDownRunnerSDKImpl.java:55)
>  9 at 
> org.apache.kylin.query.util.PushDownUtil.tryPushDownQuery(PushDownUtil.java:173)
>  10 at 
> org.apache.kylin.query.util.PushDownUtil.tryPushDownSelectQuery(PushDownUtil.java:103)
>  11 at 
> org.apache.kylin.rest.service.QueryService.tryPushDownSelectQuery(QueryService.java:773)
>  12 at 
> org.apache.kylin.rest.service.QueryService.pushDownQuery(QueryService.java:709)
>  13 at 
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:700)
>  14 at 
> org.apache.kylin.rest.service.QueryService.query(QueryService.java:231) 15 at 
> org.apache.kylin.rest.service.QueryService.queryAndUpdateCache(QueryService.java:577)
>  16 at 
> org.apache.kylin.rest.service.QueryService.queryWithCache(QueryService.java:512)
>  17 at 
> org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:395){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-5016) Avoid potential NPE issue in RDBMS Pushdown case

2021-06-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370183#comment-17370183
 ] 

ASF GitHub Bot commented on KYLIN-5016:
---

hit-lacus merged pull request #1671:
URL: https://github.com/apache/kylin/pull/1671


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@kylin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Avoid potential NPE issue in RDBMS Pushdown case
> 
>
> Key: KYLIN-5016
> URL: https://issues.apache.org/jira/browse/KYLIN-5016
> Project: Kylin
>  Issue Type: Improvement
>  Components: RDBMS Source
>Affects Versions: v2.6.5, v3.1.2
>Reporter: rongchuan.jin
>Assignee: rongchuan.jin
>Priority: Minor
>
> When I use pushdown with RDBMS source, I encounter some error when convert 
> sql which leads to pushdown failing.
> I find below stacktrace like
> {code:java}
> 2020-12-23 13:14:10,212 ERROR [Query a1bf28bb-de28-433e-ab96-28ce234a1a4a-76] 
> conv.SqlConverter : Failed to default convert sql, will use the origin input: 
> select 1 from `MOVIES_10M`.`DIM_MOVIES_10M` 2LIMIT 500 
> 3java.lang.NullPointerException 4 at 
> org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:103) 5 at 
> org.apache.calcite.sql.pretty.SqlPrettyWriter.format(SqlPrettyWriter.java:806)
>  6 at 
> org.apache.kylin.sdk.datasource.framework.conv.SqlConverter.convertSql(SqlConverter.java:69)
>  7 at 
> org.apache.kylin.sdk.datasource.framework.JdbcConnector.convertSql(JdbcConnector.java:91)
>  8 at 
> org.apache.kylin.sdk.datasource.PushDownRunnerSDKImpl.executeQuery(PushDownRunnerSDKImpl.java:55)
>  9 at 
> org.apache.kylin.query.util.PushDownUtil.tryPushDownQuery(PushDownUtil.java:173)
>  10 at 
> org.apache.kylin.query.util.PushDownUtil.tryPushDownSelectQuery(PushDownUtil.java:103)
>  11 at 
> org.apache.kylin.rest.service.QueryService.tryPushDownSelectQuery(QueryService.java:773)
>  12 at 
> org.apache.kylin.rest.service.QueryService.pushDownQuery(QueryService.java:709)
>  13 at 
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:700)
>  14 at 
> org.apache.kylin.rest.service.QueryService.query(QueryService.java:231) 15 at 
> org.apache.kylin.rest.service.QueryService.queryAndUpdateCache(QueryService.java:577)
>  16 at 
> org.apache.kylin.rest.service.QueryService.queryWithCache(QueryService.java:512)
>  17 at 
> org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:395){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-5016) Avoid potential NPE issue in RDBMS Pushdown case

2021-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370182#comment-17370182
 ] 

ASF subversion and git services commented on KYLIN-5016:


Commit e7252898eda260d87639d4d61fc01e1ef828f226 in kylin's branch 
refs/heads/master from woyumen4597
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=e725289 ]

KYLIN-5016 avoid npe issue in rdbms pushdown


> Avoid potential NPE issue in RDBMS Pushdown case
> 
>
> Key: KYLIN-5016
> URL: https://issues.apache.org/jira/browse/KYLIN-5016
> Project: Kylin
>  Issue Type: Improvement
>  Components: RDBMS Source
>Affects Versions: v2.6.5, v3.1.2
>Reporter: rongchuan.jin
>Assignee: rongchuan.jin
>Priority: Minor
>
> When I use pushdown with RDBMS source, I encounter some error when convert 
> sql which leads to pushdown failing.
> I find below stacktrace like
> {code:java}
> 2020-12-23 13:14:10,212 ERROR [Query a1bf28bb-de28-433e-ab96-28ce234a1a4a-76] 
> conv.SqlConverter : Failed to default convert sql, will use the origin input: 
> select 1 from `MOVIES_10M`.`DIM_MOVIES_10M` 2LIMIT 500 
> 3java.lang.NullPointerException 4 at 
> org.apache.calcite.sql.SqlCall.unparse(SqlCall.java:103) 5 at 
> org.apache.calcite.sql.pretty.SqlPrettyWriter.format(SqlPrettyWriter.java:806)
>  6 at 
> org.apache.kylin.sdk.datasource.framework.conv.SqlConverter.convertSql(SqlConverter.java:69)
>  7 at 
> org.apache.kylin.sdk.datasource.framework.JdbcConnector.convertSql(JdbcConnector.java:91)
>  8 at 
> org.apache.kylin.sdk.datasource.PushDownRunnerSDKImpl.executeQuery(PushDownRunnerSDKImpl.java:55)
>  9 at 
> org.apache.kylin.query.util.PushDownUtil.tryPushDownQuery(PushDownUtil.java:173)
>  10 at 
> org.apache.kylin.query.util.PushDownUtil.tryPushDownSelectQuery(PushDownUtil.java:103)
>  11 at 
> org.apache.kylin.rest.service.QueryService.tryPushDownSelectQuery(QueryService.java:773)
>  12 at 
> org.apache.kylin.rest.service.QueryService.pushDownQuery(QueryService.java:709)
>  13 at 
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:700)
>  14 at 
> org.apache.kylin.rest.service.QueryService.query(QueryService.java:231) 15 at 
> org.apache.kylin.rest.service.QueryService.queryAndUpdateCache(QueryService.java:577)
>  16 at 
> org.apache.kylin.rest.service.QueryService.queryWithCache(QueryService.java:512)
>  17 at 
> org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:395){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1671: KYLIN-5016 avoid npe issue in rdbms pushdown

2021-06-27 Thread GitBox


hit-lacus merged pull request #1671:
URL: https://github.com/apache/kylin/pull/1671


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@kylin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4995) Query exception when the query statement contains a single left parenthesis

2021-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370179#comment-17370179
 ] 

ASF subversion and git services commented on KYLIN-4995:


Commit 6621a2aaa750540054fe6951dcfacf5fbaabb166 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from 7mming7
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=6621a2a ]

KYLIN-4995 fix query exception when the query statement contains a single left 
parenthesis

(cherry picked from commit 4e63c34b11496a1b79cb7ad4a1ce7b29c99a492d)


> Query exception when the query statement contains a single left parenthesis
> ---
>
> Key: KYLIN-4995
> URL: https://issues.apache.org/jira/browse/KYLIN-4995
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v3.1.2
>Reporter: Mingming Ge
>Assignee: Mingming Ge
>Priority: Critical
> Attachments: image-2021-05-19-10-15-34-215.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Queries like this:
>  
> {code:java}
> select '( a + b) * (c+ d ' from t;{code}
> The following exception will occur during execution,
> !image-2021-05-19-10-15-34-215.png|width=592,height=326!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4879) The function of sql to remove comments is not perfect. In some cases, the sql query conditions used will be modified

2021-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370178#comment-17370178
 ] 

ASF subversion and git services commented on KYLIN-4879:


Commit 50eb7be759a3532b39b67fe0f7963f33e97fbe83 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from bingfeng.guo
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=50eb7be ]

KYLIN-4879 add UT

(cherry picked from commit 4c5f5fdd02c5b424b02b9496163a05820e1b97ca)


> The function of sql to remove comments is not perfect. In some cases, the sql 
> query conditions used will be modified
> 
>
> Key: KYLIN-4879
> URL: https://issues.apache.org/jira/browse/KYLIN-4879
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.6.0
>Reporter: wangjie
>Assignee: Yaqian Zhang
>Priority: Critical
> Fix For: v3.1.2, v4.0.0
>
>
> In the removeCommentInSql method of QueryUtil of the query module, if the 
> single quote character of the user's sql contains – or /**/, the regular 
> expression will rewrite the sql query condition.
> E.g:
> (1) When the single quotation mark contains --, line break
> {quote}String sql = "select count(*) from test_kylin_fact WHERE column_name 
> ='--this is not comment'\n "+ "LIMIT 100 offset 0";
> {quote}
> After the removeCommentInSql method, it will become:
> {quote}select count(*) from test_kylin_fact WHERE column_name = 'LIMIT 100 
> offset 0
> {quote}
> (2) Contain multiple lines of comments in single quotes
> {quote}String sql = "select count(*) from test_kylin_fact WHERE column_name 
> ='/**--this *is not comment***/'";
> {quote}
> After the removeCommentInSql method, it will become:
> {quote}select count(*) from test_kylin_fact WHERE column_name =''
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4281) Precisely set the data type of tuple expression

2021-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370181#comment-17370181
 ] 

ASF subversion and git services commented on KYLIN-4281:


Commit f5fbe9e0c64935ae3284c8e877c1834f0a3d3663 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from yaqian.zhang
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=f5fbe9e ]

Revert "KYLIN-4281 Precisely set the data type of tuple expression"

This reverts commit 5044239a


> Precisely set the data type of tuple expression
> ---
>
> Key: KYLIN-4281
> URL: https://issues.apache.org/jira/browse/KYLIN-4281
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>Priority: Major
> Fix For: v3.1.0
>
>
> Previously to simplify the calculating of sum(case when), all of the binary 
> calculation is based on BigDecimal, which is not good for all, especially 
> when dealing with count(distinct case when), whose inner data type may be hll 
> or bitmap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4355) Add validation for cube re-assignmnet(Realtime OLAP)

2021-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370180#comment-17370180
 ] 

ASF subversion and git services commented on KYLIN-4355:


Commit ec962036bd84cff9934ca28ba5d859683f0307d6 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from XiaoxiangYu
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=ec96203 ]

KYLIN-4355 add ut

(cherry picked from commit faa15ad8b13f6ccf9161b6bb59af84f29b9bf958)


> Add validation for cube re-assignmnet(Realtime OLAP)
> 
>
> Key: KYLIN-4355
> URL: https://issues.apache.org/jira/browse/KYLIN-4355
> Project: Kylin
>  Issue Type: Bug
>  Components: Real-time Streaming
>Affects Versions: v3.0.0
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Minor
> Fix For: v3.1.0
>
>
> Case 1. In assignment, specific partition can be assign to more than one 
> replica set, thus cause receiver consumed duplicate kafka message.
> Case 2. In assignment, you can remove all partition for one repilca set, 
> which is make no sense at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4955) fix typo in KYLIN UI when not set dictionary for count_distinct measure

2021-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370177#comment-17370177
 ] 

ASF subversion and git services commented on KYLIN-4955:


Commit 4c523b13b90477f143fcc20178736a7a2367ebf0 in kylin's branch 
refs/heads/kylin-on-parquet-v2 from yangjiang
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=4c523b1 ]

[KYLIN-4955] fix typo in KYLIN UI when not set dictionary for count_distinct 
measure.

(cherry picked from commit 49210310859bee203dfb8f975c3f630b211f45dc)


> fix typo in KYLIN UI when not set dictionary for count_distinct measure
> ---
>
> Key: KYLIN-4955
> URL: https://issues.apache.org/jira/browse/KYLIN-4955
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Reporter: JiangYang
>Assignee: JiangYang
>Priority: Minor
> Fix For: v3.1.2
>
> Attachments: image-2021-04-07-21-12-44-396.png
>
>
> when create count_distinct measure without set dictionary: 
> !image-2021-04-07-21-12-44-396.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1663: Cherry-pick the commit of master branch to kylin-on-parquet-v2 branch

2021-06-27 Thread GitBox


hit-lacus merged pull request #1663:
URL: https://github.com/apache/kylin/pull/1663


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@kylin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (KYLIN-4983) The stream cube will be paused when user append a batch segment first

2021-06-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370169#comment-17370169
 ] 

ASF GitHub Bot commented on KYLIN-4983:
---

hit-lacus merged pull request #1644:
URL: https://github.com/apache/kylin/pull/1644


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@kylin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The stream cube will be paused when user append a batch segment first
> -
>
> Key: KYLIN-4983
> URL: https://issues.apache.org/jira/browse/KYLIN-4983
> Project: Kylin
>  Issue Type: Bug
>  Components: Real-time Streaming
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Blocker
> Fix For: v3.1.3
>
>
> Env:
> stream cube (stream_cube) with lambda, and the window of cube is 1hour
> Before enable the stream_cube, we submit a build restful request with the 
> range [2025-01-01,2025-01-02]
>  
> Then enable the stream_cube,  the receiver node will consume data and create 
> segment in real time, but the start time of the new segment is must be less 
> than `2025-01-01`.
>  
> When the receiver node upload the data of the segment to the HDFS and notify 
> the coordinator, and the coordinator will delete the metadata of the stream 
> segment like below code, but the segment of data can't be delete in the 
> receiver node.
>  
> ```
> // If we have a exist historical segment, we should not let new realtime 
> segment overwrite it, it is so dangrous,
> // we just delete the entry to ignore the segment which should not exist
> if (segmentRange.getFirst() < minSegmentStart) {
>  logger.warn(
>  "The cube segment state is not correct because it belongs to historical 
> part, cube:{} segment:{}, clear it.",
>  cubeName, segmentState.getSegmentName());
>  coordinator.getStreamMetadataStore().removeSegmentBuildState(cubeName, 
> segmentState.getSegmentName());
>  continue;
> }
> ```
> The number of immutable segment will reach to 100.
>  
> There are two way to resolve this issue:
>  
>  # forbid appending segment in stream with lambda for the restful api
>  # delete local data and remote hdfs data before remove metadata
> Now i think it is a better way to forbid appending segment in lambda stream 
> cube.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kylin] hit-lacus merged pull request #1644: [KYLIN-4983]add rule for submit build job: forbid submitting appending segment in the lambda stream cube

2021-06-27 Thread GitBox


hit-lacus merged pull request #1644:
URL: https://github.com/apache/kylin/pull/1644


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@kylin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org