date:20210426

[jira] [Closed] (KYLIN-4979) Fix flink shaded jar version error in download-flink.sh

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4979.
---

Released at kylin 3.1.2

> Fix flink shaded jar version error in download-flink.sh
> ---
>
> Key: KYLIN-4979
> URL: https://issues.apache.org/jira/browse/KYLIN-4979
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v3.1.1
>Reporter: Yaqian Zhang
>Assignee: Yaqian Zhang
>Priority: Minor
> Fix For: v3.1.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4955) fix typo in KYLIN UI when not set dictionary for count_distinct measure

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4955.
---

Released at kylin 3.1.2

> fix typo in KYLIN UI when not set dictionary for count_distinct measure
> ---
>
> Key: KYLIN-4955
> URL: https://issues.apache.org/jira/browse/KYLIN-4955
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Reporter: JiangYang
>Assignee: JiangYang
>Priority: Minor
> Fix For: v3.1.2
>
> Attachments: image-2021-04-07-21-12-44-396.png
>
>
> when create count_distinct measure without set dictionary: 
> !image-2021-04-07-21-12-44-396.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4920) stream lambda: hive table can be in database other than default

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4920.
---

Released at kylin 3.1.2

> stream lambda: hive table can be in database other than default
> ---
>
> Key: KYLIN-4920
> URL: https://issues.apache.org/jira/browse/KYLIN-4920
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Minor
> Fix For: v3.1.2
>
>
> Now the hive table must be in the default database or the configured 
> database, if i want to create real-time stream table with lambda mode.
> But some user tables are under other database other than default.
>  
> I will fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4929) Skip metrics update for simple queries to avoid NPE warnings

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4929.
---

Released at kylin 3.1.2

> Skip metrics update for simple queries to avoid NPE warnings
> 
>
> Key: KYLIN-4929
> URL: https://issues.apache.org/jira/browse/KYLIN-4929
> Project: Kylin
>  Issue Type: Improvement
>  Components: Metrics
>Reporter: Congling Xia
>Assignee: Congling Xia
>Priority: Minor
> Fix For: v3.1.2
>
>
> Users may use simple queries like 'select 1' to check the availability of 
> kylin service. No realization is needed for such queries. Metric system will 
> raise NullPointerException when trying to get the name of the realization.
> It does not cause the query to fail, but prints a lot of annoying warning 
> logs with stack-traces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4930) unexpected empty search result in group/user management page

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4930.
---

Released at kylin 3.1.2

> unexpected empty search result in group/user management page
> 
>
> Key: KYLIN-4930
> URL: https://issues.apache.org/jira/browse/KYLIN-4930
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Reporter: Congling Xia
>Assignee: Congling Xia
>Priority: Minor
> Fix For: v3.1.2
>
> Attachments: image-2021-03-10-22-19-03-858.png, 
> image-2021-03-10-22-20-19-633.png
>
>
> When search a user/group in '/kylin/admin' page, unexpected 
> empty page may be shown. For example, when user is navigate page#2 of the 
> user list: 
> !image-2021-03-10-22-19-03-858.png|width=581,height=162!
> after search is done, the page shows:
> !image-2021-03-10-22-20-19-633.png|width=579,height=104!
> No user is listed in the page. But obviously, at least one user hits the 
> search. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4855) kylin metrics prefix bug in system-cube.sh

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4855.
---

Released at kylin 3.1.2

> kylin metrics prefix bug in system-cube.sh
> --
>
> Key: KYLIN-4855
> URL: https://issues.apache.org/jira/browse/KYLIN-4855
> Project: Kylin
>  Issue Type: Bug
>  Components: Environment 
>Affects Versions: v2.6.0
> Environment: 2.6.0+
>Reporter: Huajie Wang
>Assignee: Huajie Wang
>Priority: Minor
>  Labels: easyfix
> Fix For: v3.1.2
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> After executed "system-cube.sh cron", the system cube will be added to the 
> crontab, but the cube name is wrong, and it does not match the actual 
> situation. After testing, it is found that the system-cube is
> There is a bug in the handling of kylin metrics prefix
> In the system-cube.sh script, the prefix is "KYLIN". The logic is wrong. You 
> should get "kylin.metrics.prefix" from kylin.properties. If this item is set, 
> take the value set by the user, if not set, go to the default value. "KYLIN"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4841) Spark RDD cache is invalid when building with spark engine

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4841.
---

Released at kylin 3.1.2

> Spark RDD cache is invalid when building with spark engine
> --
>
> Key: KYLIN-4841
> URL: https://issues.apache.org/jira/browse/KYLIN-4841
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v3.1.1
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: v3.1.2
>
>
> Spark RDD cache is invalid when building with spark engine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4854) the official website document about system cube have some errors

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4854.
---

Released at kylin 3.1.2

> the official website document about system cube have some errors
> 
>
> Key: KYLIN-4854
> URL: https://issues.apache.org/jira/browse/KYLIN-4854
> Project: Kylin
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: v2.6.0
> Environment: kylin 2.6.0 +
>Reporter: Huajie Wang
>Assignee: Yaqian Zhang
>Priority: Minor
> Fix For: v3.1.2
>
> Attachments: WechatIMG23.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  
> There are some errors and unmentioned places in the official website document 
> about system cube.
> here link: 
> http://kylin.apache.org/docs/tutorial/setup_systemcube.html#Automatically%20create%20System%20Cube
> The first configuration: Create System Cube：sh system-cube.sh setup 
> Should be changed to: sh bin/system-cube.sh setup
> The third configuration: Add crontab job for System Cube：bin/system.sh cron
> Should be changed to: bin/system-cube.sh cron
> In addition: bin/build-incremental-cube.sh The username and password for 
> executing rebuild at the end of the script is the default value ADMIN:KYLIN 
> This needs to be clearly written in the document.
> The user is required to manually modify the user and password in the script 
> (if the user changes the default password)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4863) dependency cache script files not fully used

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4863.
---

Released at kylin 3.1.2

> dependency cache script files not fully used
> 
>
> Key: KYLIN-4863
> URL: https://issues.apache.org/jira/browse/KYLIN-4863
> Project: Kylin
>  Issue Type: Improvement
>  Components: Environment 
>Affects Versions: v3.1.1
>Reporter: Huajie Wang
>Assignee: Huajie Wang
>Priority: Minor
> Fix For: v3.1.2
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> kylin startup-related scripts will generate dependent cache scripts at the 
> first startup to speed up the second startup. Now it is found that the 
> dependent cache scripts are not fully utilized. Only some scripts use this 
> file. It is recommended to use this file globally to speed up the startup 
> again, and it is recommended that the generated cache file start with ".", so 
> that dependent cache scripts are not directly visible



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4837) optimize CubeMigrationCLI

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4837.
---

Released at kylin 3.1.2

> optimize CubeMigrationCLI
> -
>
> Key: KYLIN-4837
> URL: https://issues.apache.org/jira/browse/KYLIN-4837
> Project: Kylin
>  Issue Type: Improvement
>Reporter: chuxiao
>Assignee: chuxiao
>Priority: Minor
> Fix For: v3.1.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4939) Transform lookup table snapshot from segment level to cube level

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4939.
---

Released at kylin 3.1.2

> Transform lookup table snapshot from segment level to cube level
> 
>
> Key: KYLIN-4939
> URL: https://issues.apache.org/jira/browse/KYLIN-4939
> Project: Kylin
>  Issue Type: New Feature
>Affects Versions: v3.1.2
>Reporter: JiangYang
>Assignee: JiangYang
>Priority: Major
> Fix For: v3.1.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4833) use distcp to control the speed of writting hfile data to hbase cluster

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4833.
---

Released at kylin 3.1.2

> use distcp to control the speed of writting hfile data to hbase cluster
> ---
>
> Key: KYLIN-4833
> URL: https://issues.apache.org/jira/browse/KYLIN-4833
> Project: Kylin
>  Issue Type: Improvement
>  Components: Storage - HBase
>Affects Versions: v3.1.1
>Reporter: fengpod
>Assignee: fengpod
>Priority: Minor
> Fix For: v3.1.2
>
>
> When a large data is written to hbase cluster at the same time，the cluster 
> load will become very high，which will affect the query performance. This pr 
> allows data to be written data to hadoop hdfs when doing step “Convert Cuboid 
> Data to HFile”，and then hfile will be transferred to the hbase cluster by 
> DistCp。DistCp controls the speed of write data so as to reduce the pressure 
> of cluster。 This pr adds a new step " HFile Distcp To HBase" between “Convert 
> Cuboid Data to HFile” and "Load HFile to HBase Table" 。As look like this：
> !https://user-images.githubusercontent.com/4843586/100835711-013fae00-34a9-11eb-8de8-e69228ba0991.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4640) StepInfo saved wrong key about flink or spark

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4640.
---

Released at kylin 3.1.2

> StepInfo saved wrong key about flink or spark
> -
>
> Key: KYLIN-4640
> URL: https://issues.apache.org/jira/browse/KYLIN-4640
> Project: Kylin
>  Issue Type: Bug
>  Components: Flink Engine, Spark Engine
>Affects Versions: v3.1.0, v3.1.1
>Reporter: fengpod
>Priority: Minor
> Fix For: v3.1.2
>
> Attachments: image-2020-07-15-11-34-57-583.png, 
> image-2020-07-15-12-01-10-143.png
>
>
> In PatternedLogger.class，PATTERN_FLINK_APP_ID and PATTERN_SPARK_APP_ID has 
> same pattern “Submitted application (.*)”。When parse the task 
> information，wrong key will be saved to StepInfo。As shown in the picture，wrong 
> key spark_job_id be saved to Build-Cube-with-Flink step.
> !image-2020-07-15-11-34-57-583.png!
> !image-2020-07-15-12-01-10-143.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4787) The script sample.sh cannot automatically switch to the hive database set by the user to create sample hive tables

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4787.
---

Released at kylin 3.1.2

> The script sample.sh cannot automatically switch to the hive database set by 
> the user to create sample hive tables
> --
>
> Key: KYLIN-4787
> URL: https://issues.apache.org/jira/browse/KYLIN-4787
> Project: Kylin
>  Issue Type: Bug
>  Components: Client - CLI
>Affects Versions: v3.1.0, v3.1.1
> Environment: HDP3
>Reporter: Yaqian Zhang
>Assignee: Yaqian Zhang
>Priority: Minor
> Fix For: v3.1.2
>
>
> The script sample.sh use --database to specifies create_sample_ tables.sql 
> execute in which database, but this parameter has been cancelled in hive3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4940) Implement the step of "Extract Dictionary from Global Dictionary" for spark cubing engine

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4940.
---

Released at kylin 3.1.2

> Implement the step of "Extract Dictionary from Global Dictionary" for spark 
> cubing engine
> -
>
> Key: KYLIN-4940
> URL: https://issues.apache.org/jira/browse/KYLIN-4940
> Project: Kylin
>  Issue Type: New Feature
>  Components: Job Engine
>Reporter: JiangYang
>Assignee: JiangYang
>Priority: Major
> Fix For: v3.1.2
>
> Attachments: image-2021-03-19-17-16-39-061.png, 
> image-2021-03-19-17-17-05-463.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4613) add buildCubeCLi as hadoop main class and jobRestClient

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4613.
---

Released at kylin 3.1.2

> add buildCubeCLi as hadoop main class and  jobRestClient 
> -
>
> Key: KYLIN-4613
> URL: https://issues.apache.org/jira/browse/KYLIN-4613
> Project: Kylin
>  Issue Type: Improvement
>  Components: Client - CLI
>Reporter: chuxiao
>Assignee: chuxiao
>Priority: Minor
> Fix For: v3.1.2
>
>
> support submit job and wait finish. retry 3 times when error。
> CubeBuildingCLIV2 不跟原来的兼容，是因为原来的依赖kylin部署环境，尤其是依赖kylin.properties，可以认为是给管理员用的。
> 而新的CubeBuildingCLIV2理念是给用户用的，参数完全在main方法里指定，不依赖kylin部署环境。所以无法合到一起。
> 至于 JobRestClient 不直接放到 RestClient里，是因为项目不同，放进去会有循环依赖。
> 后续一件事是把客户端单拎一个module，把包括建模在内的client都放进去，不依赖kylin其他非必要的module



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4933) Support set cache strength for dict cache

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4933.
---

Released at kylin 3.1.2

> Support set cache strength for dict cache
> -
>
> Key: KYLIN-4933
> URL: https://issues.apache.org/jira/browse/KYLIN-4933
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v3.1.1
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Major
> Fix For: v3.1.2
>
>
> Kylin uses guava LoadingCache(LUR Cache) to cache dictionary with 
> soft(SoftReference) cache strength , it will be collected due to JVM GC. 
> In cases of queries over  dict-encoded-dimension cubes, we would always 
> prefer to cache dimension dictionary in LUR cache to avoid reloading dict 
> from HBase/HDFS, which will causes unstable query performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4938) Remove segment by UUID

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4938.
---

Released at kylin 3.1.2

> Remove segment by UUID 
> ---
>
> Key: KYLIN-4938
> URL: https://issues.apache.org/jira/browse/KYLIN-4938
> Project: Kylin
>  Issue Type: New Feature
>  Components: Job Engine
>Affects Versions: v3.1.2
>Reporter: JiangYang
>Assignee: JiangYang
>Priority: Major
> Fix For: v3.1.2
>
>
> Sometimes two segment have the same name, need remove segment by UUID .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4901) Query result use diff timezone in real-time stream

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4901.
---

Released at kylin 3.1.2

> Query result use diff timezone in real-time stream
> --
>
> Key: KYLIN-4901
> URL: https://issues.apache.org/jira/browse/KYLIN-4901
> Project: Kylin
>  Issue Type: Bug
>  Components: Real-time Streaming
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Major
> Fix For: v3.1.2
>
> Attachments: Different timezones for displaying results.png, other 
> non-derived time columns.png
>
>
> When i test the real-time stream with timezone configuration, I find that the 
> query result use diff timezone format.
> For example, I set the `kylin.stream.event.timezone` to GMT-1 (after i fix 
> the issue 
> [kylin-4900|https://issues.apache.org/jira/projects/KYLIN/issues/KYLIN-4900?filter=allopenissues]),
>  and push some data to kafka.
>  
> The result of derived time column is GMT-8 format, but other time/date 
> columns are displayed using GMT+0 format.
>  
> This result make me confused.
>  
> How to reproduce
>  
> The data produced by use `$KYLIN_HOME/bin/kylin.sh 
> org.apache.kylin.source.kafka.util.KafkaSampleProducer --topic 
> kylin_streaming_topic --broker localhost:9092 --interval 1`
>  
> message template is ：
> 2021-02-05 06:32:28,720 INFO [main] util.KafkaSampleProducer:136 : Sending 1 
> message: 
> \{"country":"US","amount":65.78351439157635,"qty":9,"currency":"USD","order_time":1612506748660,"category":"ELECTRONIC","device":"Windows","user":{"gender":"Male","id":"e1f07f05-9eff-46fa-d401-180d0441df13","first_name":"unknown","age":22}}
>  
> The order_time of first message is 1612506748660 which is 2021-02-05 05:32:28 
> GMT-1 or 2021-02-05 06:32:28 GMT-0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4921) stream config lost when create table with same table_name in diff project.

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4921.
---

Released at kylin 3.1.2

> stream config lost when create table with same table_name in diff project.
> --
>
> Key: KYLIN-4921
> URL: https://issues.apache.org/jira/browse/KYLIN-4921
> Project: Kylin
>  Issue Type: Bug
>  Components: Real-time Streaming
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Major
> Fix For: v3.1.2
>
>
> In project A, I have created a stream table test_table.
> When I create the table test_table in project B, the table can't be created, 
> but the source config of table in project A lost. 
>  
> The reason: 
> In the `saveStreamingConfig` function, if creating table failed, the config 
> will be dropped in finally phase.
>  
>  
> By design: 
> In order to identity one stream table `StreamingSourceConfig` in the 
> metastore, kylin use the key
> `ResourceStore.STREAMING_V2_RESOURCE_ROOT + "/" + name + ".json"` as the 
> identifier.
> But this design can't work in multi-project.
> [~hit_lacus] PTAL
>  
> [Design 
> Doc|https://docs.google.com/document/u/1/d/1QfTn5dBcHeY2EAnMUS2V0sqj9dpFQwmVuAZon7wCtns/edit#]
>  
> Design:
> h1. The Rowkey of the Stream source config
>  
> h1. Origin design
>  
> In the project `project_test`, we create a real-time stream table with the 
> name of `stream_table`, which will create two metadata, one the `tableDesc` 
> and another is `streamSourceConfig`.
>  
> The `tableDesc` is stored in the hbase by the path 
> `table_prefix/tablename--projectname.json`, but the `streamSourceConfig` is 
> stored in the hbase by the path `stream_source_prefix/tablename.json`. The 
> action of creating tables with the same name in different projects is not 
> allowed.
>  
> h1. New design
>  
> The new rowkey for storing the `streamSourceConfig` is 
> `stream_source_prefix/tablename--projectname.json`.
>  
> h1. How to deal with the compatibility in kylin
> h2. The type of RowKey
>  * NewRowKey:  `stream_source_prefix/tablename--projectname.json`
>  * OldRowKey:  `stream_source_prefix/tablename.json`
> h2. The type of operation
>  * save source
>  * update source
>  * query/get source
>  * delete
> h3. saveStreamingConfig:
>  * store the stream source config with NewRowKey 
> h3. removeStreamingConfig:
>  * if the source config exist in the NewRowKey, delete the source config 
> using the  NewRowKey
>  * if the source config exist in the OldRowKey, delete the source config 
> using the OldRowKey
> h3. updateStreamingConfig:
>  * removeStreamingConfig
>  * saveStreamingConfig
>  
> h3. reloadStreamingConfigLocal/queryStreamingConfig
> Hypothesis: The stream config must exist.
>  
>  * check NewRowKey, if the source config exists, return the object.
>  * if the source config doesn't exist, and the source config exists in the 
> OldRowKey
>  * get the Source config and update the project name of the source config
>  * delete the source config in the OldRowKey and resave the source config in 
> the NewRowKey
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4896) 构建过程中, cube metadata 丢失

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4896.
---

Released at kylin 3.1.2

> 构建过程中, cube metadata 丢失
> ---
>
> Key: KYLIN-4896
> URL: https://issues.apache.org/jira/browse/KYLIN-4896
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v3.1.1
>Reporter: hejian
>Assignee: Linghui Zeng
>Priority: Major
> Fix For: v3.1.2
>
> Attachments: image-2021-02-03-19-11-09-261.png
>
>
> {quote}今天又出现了在cube使用分布式构建过程中，cube metadata丢失的问题了，
> 构建到第四步（Build Dimension Dictionary）的时候出现了这个cube的metedata丢失的问题。
> 错误日志如下，
> !image-2021-02-03-19-11-09-261.png!
> kylin版本3.1.1采用的是  kylin.job.scheduler.default=2，
> 其余的配置均为正确的。{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4862) Build Cube use two job engine, cause there is no valid state transfer from:ERROR to:SUCCEED

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4862.
---

Released at kylin 3.1.2

> Build Cube use two job engine, cause there is no valid state transfer 
> from:ERROR to:SUCCEED
> ---
>
> Key: KYLIN-4862
> URL: https://issues.apache.org/jira/browse/KYLIN-4862
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Reporter: JiangYang
>Assignee: JiangYang
>Priority: Major
> Fix For: v3.1.2
>
> Attachments: Screen Shot 2020-11-11 at 4.21.38 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4882) config["kylin.engine.spark-fact-distinct"] overwrite in the Cube-level is invalid

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4882.
---

Released at kylin 3.1.2

> config["kylin.engine.spark-fact-distinct"] overwrite in the Cube-level is 
> invalid
> -
>
> Key: KYLIN-4882
> URL: https://issues.apache.org/jira/browse/KYLIN-4882
> Project: Kylin
>  Issue Type: Bug
>  Components: Spark Engine
>Affects Versions: v2.6.2, v2.6.3, v2.6.4, v2.6.5, v2.6.6
>Reporter: QiangZhang
>Assignee: QiangZhang
>Priority: Major
> Fix For: v3.1.2
>
> Attachments: image-2021-01-21-17-56-52-701.png
>
>
> When I overwrote the config "kylin.engine.spark-fact-distinct" in the 
> Cube-level,It didn't work



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4847) Cuboid to HFile step failed on multiple job server env because of trying to read the metric jar file from the inactive job server's location.

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4847.
---

Released at kylin 3.1.2

> Cuboid to HFile step failed on multiple job server env because of trying to 
> read the metric jar file from the inactive job server's location.
> -
>
> Key: KYLIN-4847
> URL: https://issues.apache.org/jira/browse/KYLIN-4847
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v3.1.0
>Reporter: yoonsung.lee
>Assignee: yoonsung.lee
>Priority: Major
> Fix For: v3.1.2
>
>
> h1. My Cluster Setting
> 1. versIon: 3.1.0
>  2. 2 job servers(job & query mode), 2 query only servers. Each of them runs 
> on each different host machine.
>  3. Use spark engine to build job.
> h1. Problem Circumstance
> h2. Root cause
> The active job server submits spark job to execute `Convert Cuboid Data to 
> HFile`. But the active job server get an error because a resource for 
> submitting spark job has the wrong path which the active job server cannot 
> read.
>  * wrong resource: 
> ${KYLIN_HOME}/tomcat/webapps/kylin/WEB-INF/lib/metrics-core-2.2.0.jar
>  * The ${KYLIN_HOME} is the inactive job server's location for only the above 
> jar file.
> This situation occurs in the following two circumstances.
> h2. On build cube
> 1. Request the build API to the inactive job server. (exactly: 
> /kylin/api/cubes/${cube_name}/rebuild )
>  2. Inactive job server stores the build task in meta store.
>  3. Active job server takes the build task and proceeds it.
>  4. Active job server failed on the `Convert Cuboid Data to HFile` step. 
> **This doesn't occur when I request build API to the active job server.**
> h2. On merge
> 1. Trigger merge cube job periodically
>  2. Active job server takes the merge task and proceeds it.
>  3. Active job server failed on the `Convert Cuboid Data to HFile` step.
> **This doesn't occur when there is only one job server in the cluster.**
> h1. Progress to solve this.
> I'm trying to find which code set the metrics-core-2.2.0.jar path wrong.
>  Until now, I guess this code would be the set the metrics-core-2.2.0.jar for 
> the `Cuboid to HFile` spark job.
>  * 
> [https://github.com/apache/kylin/blob/kylin-3.1.0/storage-hbase/src/main/java/org/apache/kylin/storage/hbase/steps/HBaseSparkSteps.java#L69]
> h1. Questions
> 1. I'm trying to remote debug with IDE to make sure my guess is right. But 
> the breakpoint on that line is not captured on Runtime. It seems to be called 
> on the booting phase. Is it right?
> 2. Is there any hint or guessing to solve this issue regardless of the above 
> my progress?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4838) fix KYLIN-4679 bug

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4838.
---

Released at kylin 3.1.2

> fix KYLIN-4679 bug
> --
>
> Key: KYLIN-4838
> URL: https://issues.apache.org/jira/browse/KYLIN-4838
> Project: Kylin
>  Issue Type: Improvement
>Reporter: chuxiao
>Assignee: chuxiao
>Priority: Major
> Fix For: v3.1.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4826) The value of config kylin.source.hive.warehouse-dir can not be found

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4826.
---

Released at kylin 3.1.2

> The value of config kylin.source.hive.warehouse-dir can not be found
> 
>
> Key: KYLIN-4826
> URL: https://issues.apache.org/jira/browse/KYLIN-4826
> Project: Kylin
>  Issue Type: Bug
>  Components: Environment 
>Affects Versions: v3.1.1
> Environment: kylin 3.1.1
>Reporter: Huajie Wang
>Priority: Major
> Fix For: v3.1.2
>
> Attachments: WechatIMG841.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>  
>  when using kylin 3.1.1's global dictionary, building the global dictionary 
> failed because it could not be found "kylin.source.hive.warehouse-dir", 
> resulting input path error.
> After kylin is started for the first time, when it is started again,  
> ${dir}/cached-hive-dependency.sh will be used, now " 
> kylin.source.hive.warehouse-dir" is not in cached-hive- dependency.sh, the 
> parameter is null, causing a bug
> In addition https://issues.apache.org/jira/browse/KYLIN-4028 Whether this is 
> necessary is worth discussing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4819) build cube failed when `kylin.metadata.hbase-client-retries-number` great than 1

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4819.
---

Released at kylin 3.1.2

> build cube failed when `kylin.metadata.hbase-client-retries-number` great 
> than 1
> 
>
> Key: KYLIN-4819
> URL: https://issues.apache.org/jira/browse/KYLIN-4819
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v3.1.1
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Fix For: v3.1.2
>
>
> {code:bash}
> 2020-11-11 07:31:49,187 TRACE [Scheduler 2133794029 Job 
> 70c242ce-6756-f77a-4b79-6b75c6ecd884-22265] hbase.HBaseResourceStore:334 : 
> Update row /execute_output/70c242ce-6756-f77a-4b79-6b75c6ecd884-10 from 
> oldTs: 1605051060239, to newTs: 1605051080210, operation result: false
> 2020-11-11 07:31:49,196 ERROR [Scheduler 2133794029 Job 
> 70c242ce-6756-f77a-4b79-6b75c6ecd884-22265] common.MapReduceExecutable:212 : 
> error execute 
> MapReduceExecutable\{id=70c242ce-6756-f77a-4b79-6b75c6ecd884-10, name=Build 
> N-Dimension Cuboid : level 5, state=RUNNING}
> org.apache.kylin.common.persistence.WriteConflictException: Overwriting 
> conflict /execute_output/70c242ce-6756-f77a-4b79-6b75c6ecd884-10, expect old 
> TS 1605051060239, but it is 1605051080210
>  at 
> org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:337)
>  at 
> org.apache.kylin.common.persistence.ResourceStore$6.call(ResourceStore.java:443)
>  at 
> org.apache.kylin.common.persistence.ResourceStore$6.call(ResourceStore.java:440)
>  at 
> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>  at 
> org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceWithRetry(ResourceStore.java:440)
>  at 
> org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:428)
>  at 
> org.apache.kylin.common.persistence.ResourceStore.checkAndPutResource(ResourceStore.java:422)
>  at 
> org.apache.kylin.common.persistence.ResourceStore.checkAndPutResource(ResourceStore.java:402)
>  at 
> org.apache.kylin.common.persistence.ResourceStore.checkAndPutResource(ResourceStore.java:381)
>  at 
> org.apache.kylin.job.dao.ExecutableDao.writeJobOutputResource(ExecutableDao.java:252)
>  at 
> org.apache.kylin.job.dao.ExecutableDao.updateJobOutput(ExecutableDao.java:426)
>  at 
> org.apache.kylin.job.execution.ExecutableManager.addJobInfo(ExecutableManager.java:570)
>  at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:177)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:191)
>  at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:191)
>  at 
> org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {code}
> When the HBase cluster has performance problems or regions move, kylin may 
> fail to access HBase. However, many exceptions can be recovered by retrying. 
> Therefore, I suggest setting the default value of the number of retries to 3 
> [KYLIN-4711|https://issues.apache.org/jira/browse/KYLIN-4711]
> However, after retrying is enabled, the exception writeconflictexception will 
> appear in some scenarios, which is caused by the checkAndPut operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4827) SparkMergingDictionary parallelize not work

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4827.
---

Released at kylin 3.1.2

> SparkMergingDictionary parallelize not work 
> 
>
> Key: KYLIN-4827
> URL: https://issues.apache.org/jira/browse/KYLIN-4827
> Project: Kylin
>  Issue Type: Improvement
>  Components: Spark Engine
>Reporter: JiangYang
>Assignee: JiangYang
>Priority: Major
> Fix For: v3.1.2
>
> Attachments: Completed Stages (2) copy.png, Pasted Graphic 1.png, 
> Pasted Graphic 2.png, Stage Id.png, Storage Environment.png, 
> image-2020-11-25-12-46-05-750.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4771) Query streaming cube - Thread pool of MultiThreadsResultCollector be blocked.

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4771.
---

Released at kylin 3.1.2

> Query streaming cube - Thread pool of MultiThreadsResultCollector be blocked.
> -
>
> Key: KYLIN-4771
> URL: https://issues.apache.org/jira/browse/KYLIN-4771
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine, Real-time Streaming
>Affects Versions: v3.1.0
> Environment: Centos 7.4.1
> hbase 1.2.4
> hive 2.0.1
> hadoop 2.7.2
>Reporter: GuKe
>Assignee: GuKe
>Priority: Major
> Fix For: v3.1.2
>
> Attachments: image-2020-09-23-11-57-15-795.png, 
> image-2020-09-23-11-57-55-413.png
>
>
> When the receiver query streaming-cube on local it will be blocked for 
> unknown reason.
>  This problem can lead to it the receiver can't response query to request.
> !image-2020-09-23-11-57-15-795.png!
> !image-2020-09-23-11-57-55-413.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4794) Make it possible to force hit a cube set for sqls with cube join

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4794.
---

Released at kylin 3.1.2

> Make it possible to force hit a cube set for sqls with cube join
> 
>
> Key: KYLIN-4794
> URL: https://issues.apache.org/jira/browse/KYLIN-4794
> Project: Kylin
>  Issue Type: New Feature
>Reporter: JiangYang
>Assignee: JiangYang
>Priority: Major
> Fix For: v3.1.2
>
>
> Currently only single cube can be specified. However, when with complex sqls 
> with multiple subqueries for cube joins, multiple cubes will be hit. In this 
> case, we need to specify a cube set rather than a single cube.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4702) Missing cube-level lookup table snapshot when doing cube migration

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4702.
---

Released at kylin 3.1.2

> Missing cube-level lookup table snapshot when doing cube migration
> --
>
> Key: KYLIN-4702
> URL: https://issues.apache.org/jira/browse/KYLIN-4702
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: JiangYang
>Priority: Major
> Fix For: v3.1.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4658) Union all issue with regarding to windows function & aggregation on

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4658.
---

Released at kylin 3.1.2

>  Union all issue with regarding to windows function & aggregation on
> 
>
> Key: KYLIN-4658
> URL: https://issues.apache.org/jira/browse/KYLIN-4658
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>Priority: Major
> Fix For: v3.1.2
>
>
> Test SQL:
> {code}
> select CNT, GMV, sum(GMV) over(partition by SLR_SEGMENT_CD) TOTAL_GMV, 
> SLR_SEGMENT_CD, LSTG_FORMAT_NAME
> from 
> (select sum(PRICE) GMV, sum(ITEM_COUNT) CNT, SLR_SEGMENT_CD, LSTG_FORMAT_NAME 
> from TEST_KYLIN_FACT group by SLR_SEGMENT_CD, LSTG_FORMAT_NAME 
> UNION ALL
> select sum(PRICE) GMV, sum(ITEM_COUNT) CNT, SLR_SEGMENT_CD, LSTG_FORMAT_NAME 
> from TEST_KYLIN_FACT group by SLR_SEGMENT_CD, LSTG_FORMAT_NAME) 
> order by TOTAL_GMV
> {code}
>  
> Exception:
> {code}
> Index: 2, Size: 2 while executing SQL: "select * from (select CNT, GMV, 
> sum(GMV) over(partition by SLR_SEGMENT_CD) TOTAL_GMV, SLR_SEGMENT_CD, 
> LSTG_FORMAT_NAME from (select sum(PRICE) GMV, sum(ITEM_COUNT) CNT, 
> SLR_SEGMENT_CD, LSTG_FORMAT_NAME from TEST_KYLIN_FACT group by 
> SLR_SEGMENT_CD, LSTG_FORMAT_NAME UNION ALL select sum(PRICE) GMV, 
> sum(ITEM_COUNT) CNT, SLR_SEGMENT_CD, LSTG_FORMAT_NAME from TEST_KYLIN_FACT 
> group by SLR_SEGMENT_CD, LSTG_FORMAT_NAME) order by TOTAL_GMV) limit 5"
> {code}
> Similar issue for the following sql:
> {code}
> select LSTG_FORMAT_NAME,
>SLR_SEGMENT_CD,
>CAL_DT,
>sum(CNT) as CNT
> from
>   (select LSTG_FORMAT_NAME,
>   SLR_SEGMENT_CD,
>   CAL_DT,
>   sum(ITEM_COUNT) CNT
>from TEST_KYLIN_FACT
>where LSTG_FORMAT_NAME = 'ABIN'
>group by LSTG_FORMAT_NAME,
> SLR_SEGMENT_CD,
> CAL_DT
>UNION ALL select 'NON-ABIN' as LSTG_FORMAT_NAME,
> SLR_SEGMENT_CD,
> CAL_DT,
> case
> when SLR_SEGMENT_CD > 1000 then CNT * 2
> else CNT * 3
> end as CNT
>from
>  (select SLR_SEGMENT_CD,
>  CAL_DT,
>  sum(ITEM_COUNT) CNT
>   from TEST_KYLIN_FACT
>   where LSTG_FORMAT_NAME <> 'ABIN'
>   group by SLR_SEGMENT_CD,CAL_DT))
> group by LSTG_FORMAT_NAME,
>  SLR_SEGMENT_CD,
>  CAL_DT
> order by CNT
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4879) The function of sql to remove comments is not perfect. In some cases, the sql query conditions used will be modified

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4879.
---

Released at kylin 3.1.2

> The function of sql to remove comments is not perfect. In some cases, the sql 
> query conditions used will be modified
> 
>
> Key: KYLIN-4879
> URL: https://issues.apache.org/jira/browse/KYLIN-4879
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.6.0
>Reporter: wangjie
>Assignee: Yaqian Zhang
>Priority: Critical
> Fix For: v3.1.2, v4.0.0-GA
>
>
> In the removeCommentInSql method of QueryUtil of the query module, if the 
> single quote character of the user's sql contains – or /**/, the regular 
> expression will rewrite the sql query condition.
> E.g:
> (1) When the single quotation mark contains --, line break
> {quote}String sql = "select count(*) from test_kylin_fact WHERE column_name 
> ='--this is not comment'\n "+ "LIMIT 100 offset 0";
> {quote}
> After the removeCommentInSql method, it will become:
> {quote}select count(*) from test_kylin_fact WHERE column_name = 'LIMIT 100 
> offset 0
> {quote}
> (2) Contain multiple lines of comments in single quotes
> {quote}String sql = "select count(*) from test_kylin_fact WHERE column_name 
> ='/**--this *is not comment***/'";
> {quote}
> After the removeCommentInSql method, it will become:
> {quote}select count(*) from test_kylin_fact WHERE column_name =''
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4667) Automatically set kylin.query.cache-signature-enabled to be true when memcached is enabled

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4667.
---

Released at kylin 3.1.2

> Automatically set kylin.query.cache-signature-enabled to be true when 
> memcached is enabled
> --
>
> Key: KYLIN-4667
> URL: https://issues.apache.org/jira/browse/KYLIN-4667
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: JiangYang
>Priority: Major
> Fix For: v3.1.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4836) fix CubeMigrationCLI bug

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4836.
---

Released at kylin 3.1.2

> fix CubeMigrationCLI bug
> 
>
> Key: KYLIN-4836
> URL: https://issues.apache.org/jira/browse/KYLIN-4836
> Project: Kylin
>  Issue Type: Improvement
>Reporter: chuxiao
>Assignee: chuxiao
>Priority: Critical
> Fix For: v3.1.2
>
>
> 解决cube名以 
> "tablexxx"开头，错误当成table来迁移的问题。解决源集群和目标集群model不在同一个项目下，未修改model的项目信息的问题。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4636) Make /api/admin/public_config callable for profile saml

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4636.
---

Released at kylin 3.1.2

> Make /api/admin/public_config callable for profile saml
> ---
>
> Key: KYLIN-4636
> URL: https://issues.apache.org/jira/browse/KYLIN-4636
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: JiangYang
>Priority: Major
> Fix For: v3.1.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4810) TrieDictionary is not correctly build

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4810.
---

Released at kylin 3.1.2

> TrieDictionary is not correctly build
> -
>
> Key: KYLIN-4810
> URL: https://issues.apache.org/jira/browse/KYLIN-4810
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.3.2
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Critical
>  Labels: Dictionary
> Fix For: v3.1.2
>
>
> Hi, recently, I've met a problem in our product environment: Segments failed 
> to merge because TrieDictionaryForest was disordered
> {code:java}
> java.lang.IllegalStateException: Invalid input data. Unordered data cannot be 
> split into multi trees
> at 
> org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:92)
> at 
> org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:78)
> at 
> org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder.addValue(DictionaryGenerator.java:214)
> at 
> org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:81)
> at 
> org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:65)
> at 
> org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(DictionaryGenerator.java:106)
> {code}
> After some analysis, we found out when there is large values in a 
> dict-encoded column, iterating over a single TrieDictionaryTree will get 
> unordered data.
>  
>  Digging into the source code,  the root cause is as described: 
>  # Kylin will split a TrieTree Node into two parts when a single nodes's 
> value length is more than 255 bytes
>  # Then, these tow parts of value will be added to build the TrieTree. In 
> fact the splitted two parts should not be used as new values to add to the 
> TrieTree.
>  # Step 2 will cause the TrieDictionaryTree build more leave nodes，and the 
> extra leaf nodes will be 'end-value' of dictionary tree;
>  # It has no impact to the correctness of the dict tree itself, except for 
> adding some additional nodes .
>  # But If you spit a UTF-8 word, you will get unordered data when iterating 
> over the tree ( Something todo with Java UTF-8  String Serialize/Deserialize 
> implementations. Please Refer to JDK sun.nio.cs.UTF_8.class)
> How to re-produce ? Run test code :
> {code:java}
> TrieDictionaryForestBuilder builder = new TrieDictionaryForestBuilder(new 
> StringBytesConverter());
> String longUrl = 
> "xx你好~~~";
> builder.addValue(longUrl);
> TrieDictionaryForest dict = builder.build();
> TrieDictionaryForestBuilder mergeBuild = new TrieDictionaryForestBuilder(new 
> StringBytesConverter());
> for (int i = dict.getMinId(); i <= dict.getMaxId(); i++) {
>     String str = dict.getValueFromId(i);
>     System.out.println("add value into merge tree");
>     mergeBuild.addValue(str);
> }
> The log output of this test code is:
> add value into merge tree
> add value into merge tree
> 16:59:36 [main] INFO 
> org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:127)
>  values not in ascending order, previous 
> 'xx\xEF\xBF\xBD',
>  current 
> 'xx\xE4\xBD\xA0\xE5\xA5\xBD~~~'
> {code}
> We can see from the test code's output：
>  # We only add 1 value but the tire dictionary tree turn out to have 2 end 
> vlaues
>  # Iterating over the TrieDictionary Tree got unordered data
> We address this problem by
>  # classify values which is a whole column value, which is splitted value,
>  # not mark splitted value as end-value of a TrieTree Node.
> I wonder if there is something wrong, thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4711) Change default value to 3 for kylin.metadata.hbase-client-retries-number

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4711.
---

Released at kylin 3.1.2

> Change default value to 3 for kylin.metadata.hbase-client-retries-number
> 
>
> Key: KYLIN-4711
> URL: https://issues.apache.org/jira/browse/KYLIN-4711
> Project: Kylin
>  Issue Type: Improvement
>Affects Versions: v3.1.0
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Fix For: v3.1.2
>
>
> ```shell
>  java.lang.RuntimeException: 
> org.apache.kylin.job.exception.PersistentException: 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=1, exceptions:
>  Thu Aug 20 21:06:01 GMT+08:00 2020, RpcRetryingCaller
> {globalStartTime=1597928761253, pause=1000, retries=1}
> , org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: Region 
> kylin_production_metadata,/execute_output/3adc92f2-edcd-2705-5a9c-ad0afe4a0808-01,1594348337103.48b9e5e9c3c7891750236fcec84b38d5.
>  is not online on xxx.xxx.xxx.xxx,16031,1558009276096
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3033)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1110)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2064)
>  at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33857)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2189)
>  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>  at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>  at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>  at java.lang.Thread.run(Thread.java:745)
>  on xxx.xxx.xxx.xxx,16031,1558009276096
>  at 
> org.apache.kylin.job.execution.ExecutableManager.getOutput(ExecutableManager.java:174)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.getOutput(AbstractExecutable.java:450)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.isDiscarded(AbstractExecutable.java:561)
>  at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:165)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:191)
>  at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:191)
>  at 
> org.apache.kylin.job.impl.threadpool.DistributedScheduler$JobRunner.run(DistributedScheduler.java:110)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  ```
>  Recently, our build job failed occasionally. After analysis, it was found 
> that the reason for the failure was due to abnormal access to the MetaStore. 
> We use HBase as MetaStore. 
>  When accessing HBase, the client will cache the region information of the 
> table in the client. When the region was moved, client will not actively 
> update the information in the cache. So the client will receive a 
> NotServingRegionException, the client will update the cache information when 
> retrying. But the number of retries in kylin is 1, which means that the 
> clinet will not try again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4962) Fix NPE in ShrunkDict step

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4962.
---

Released at kylin 3.1.2

> Fix NPE in ShrunkDict step
> --
>
> Key: KYLIN-4962
> URL: https://issues.apache.org/jira/browse/KYLIN-4962
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v3.1.2
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Critical
> Fix For: v3.1.2
>
>
> There are some fields in CreateShrunkenDictionary(a spark function) that can 
> not be serialized, causing NPE.
>  
> Caused by: java.lang.NullPointerException
>  at org.apache.kylin.common.KylinConfig.getManager(KylinConfig.java:474)
>  at org.apache.kylin.common.KylinConfig.getManager(KylinConfig.java:472)
>  at org.apache.kylin.cube.CubeDescManager.getInstance(CubeDescManager.java:67)
>  at org.apache.kylin.cube.CubeInstance.getDescriptor(CubeInstance.java:212)
>  at org.apache.kylin.cube.CubeSegment.getCubeDesc(CubeSegment.java:142)
>  at 
> org.apache.kylin.cube.CubeSegment.buildGlobalDictionaryMap(CubeSegment.java:386)
>  at 
> org.apache.kylin.engine.spark.SparkCubingByLayer$CreateShrunkenDictionary.call(SparkCubingByLayer.java:592)
>  at 
> org.apache.kylin.engine.spark.SparkCubingByLayer$CreateShrunkenDictionary.call(SparkCubingByLayer.java:519)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4900) The result of derived time columns are error, when timezone is GMT-1 or GMT-N

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4900.
---

Released at kylin 3.1.2

> The result of derived time columns are error, when timezone is GMT-1 or GMT-N
> -
>
> Key: KYLIN-4900
> URL: https://issues.apache.org/jira/browse/KYLIN-4900
> Project: Kylin
>  Issue Type: Bug
>  Components: Real-time Streaming
>Reporter: Kun Liu
>Assignee: Kun Liu
>Priority: Blocker
> Fix For: v3.1.2
>
> Attachments: day_start is error.png, result- day_start.png
>
>
> When set the configuration of `kylin.stream.event.timezone` to GMT-1 or 
> GMT-N, the result of DAY_START is error.
>  
> The data produced by use `$KYLIN_HOME/bin/kylin.sh 
> org.apache.kylin.source.kafka.util.KafkaSampleProducer --topic 
> kylin_streaming_topic --broker localhost:9092 --interval 1`
>  
> message template is ：
> 2021-02-05 06:32:28,720 INFO [main] util.KafkaSampleProducer:136 : Sending 1 
> message: 
> \{"country":"US","amount":65.78351439157635,"qty":9,"currency":"USD","order_time":1612506748660,"category":"ELECTRONIC","device":"Windows","user":{"gender":"Male","id":"e1f07f05-9eff-46fa-d401-180d0441df13","first_name":"unknown","age":22}}
>  
> The order_time of first message is 1612506748660 which is 2021-02-05 14:32:28 
> GMT+8 or 2021-02-05 5:32:28 GMT-0
>  
> The query result is in the attachments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [kylin] hit-lacus merged pull request #1647: Prepare download and release note for Kylin 3.1.2

2021-04-26 Thread GitBox



hit-lacus merged pull request #1647:
URL: https://github.com/apache/kylin/pull/1647


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kylin] hit-lacus opened a new pull request #1647: Prepare download and release note for Kylin 3.1.2

2021-04-26 Thread GitBox



hit-lacus opened a new pull request #1647:
URL: https://github.com/apache/kylin/pull/1647


   ## Proposed changes
   
   Describe the big picture of your changes here to communicate to the 
maintainers why we should accept this pull request. If it fixes a bug or 
resolves a feature request, be sure to link to that issue.
   
   ## Types of changes
   
   What types of changes does your code introduce to Kylin?
   _Put an `x` in the boxes that apply_
   
   - [ ] Bugfix (non-breaking change which fixes an issue)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to not work as expected)
   - [ ] Documentation Update (if none of the other choices apply)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after 
creating the PR. If you're unsure about any of them, don't hesitate to ask. 
We're here to help! This is simply a reminder of what we are going to look for 
before merging your code._
   
   - [ ] I have create an issue on [Kylin's 
jira](https://issues.apache.org/jira/browse/KYLIN), and have described the 
bug/feature there in detail
   - [ ] Commit messages in my PR start with the related jira ID, like 
"KYLIN- Make Kylin project open-source"
   - [ ] Compiling and unit tests pass locally with my changes
   - [ ] I have added tests that prove my fix is effective or that my feature 
works
   - [ ] If this change need a document change, I will prepare another pr 
against the `document` branch
   - [ ] Any dependent changes have been merged
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
user@kylin or dev@kylin by explaining why you chose the solution you did and 
what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (KYLIN-4971) Add new measure bitmap_map for count distinct measure in UI

2021-04-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KYLIN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331991#comment-17331991
 ] 

ASF GitHub Bot commented on KYLIN-4971:
---

Ted-Jiang commented on pull request #1633:
URL: https://github.com/apache/kylin/pull/1633#issuecomment-826718400


   > @Ted-Jiang Nice！Can you merge this two commit to one commit?
   
   sure,fix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add new measure bitmap_map for count distinct  measure in UI
> 
>
> Key: KYLIN-4971
> URL: https://issues.apache.org/jira/browse/KYLIN-4971
> Project: Kylin
>  Issue Type: Task
>Reporter: JiangYang
>Assignee: JiangYang
>Priority: Minor
> Fix For: v3.1.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [kylin] Ted-Jiang commented on pull request #1633: [KYLIN-4971] Add new measure bitmap_map for count distinct at fronten…

2021-04-26 Thread GitBox



Ted-Jiang commented on pull request #1633:
URL: https://github.com/apache/kylin/pull/1633#issuecomment-826718400


   > @Ted-Jiang Nice！Can you merge this two commit to one commit?
   
   sure,fix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (KYLIN-4990) 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录

2021-04-26 Thread Xiaoxiang Yu (Jira)



[ 
https://issues.apache.org/jira/browse/KYLIN-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331931#comment-17331931
 ] 

Xiaoxiang Yu edited comment on KYLIN-4990 at 4/26/21, 9:28 AM:
---

Hello [~linlin994395], it is quite complex situation, could you contact me via 
wechat, so we can have a direct discussion. My wechat id is "hit-lacus" .


was (Author: xxyu):
Hello [~linlin994395], it is quite complex situation, could you contact me via 
wechat, so we can have a direct discussion. 

> 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录
> 
>
> Key: KYLIN-4990
> URL: https://issues.apache.org/jira/browse/KYLIN-4990
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v3.1.1
>Reporter: xue lin
>Priority: Major
> Attachments: s3-hive-全局字典表.png
>
>
> 我参考了如下文档在涉及到bitmap时构建hive全局字典表
> [http://kylin.apache.org/cn/docs/howto/howto_use_hive_mr_dict.html]
> [https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary]
> https://issues.apache.org/jira/browse/KYLIN-4616
> 理想状况下，希望将表都放在S3上，当今天如下配置时
> ---
> # kylin_hive_conf.xml
> 
>  hive.metastore.warehouse.dir
>  s3://etl-script-product/hive-kylin-dict
>  location of default database for the warehouse
> 
> ---
> S3上表存储情况见附件
> 但当kylin进行到Build Hive Global Dict - parallel part build，报错如下
> ---
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
> not exist: 
> hdfs://ip-10-50-69-202.eu-west-1.compute.internal:8020/kylin_intermediate_cube_fact_remain_dc1531fe_0197_4ab1_a2d5_fe6d6629bb09_distinct_value/dict_column=VIEW_FACT_REMAIN_ID
>  at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
>  at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:271)
>  at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:358)
>  at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303)
>  at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
>  at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
>  at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>  at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>  at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
>  at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:198)
>  at 
> org.apache.kylin.engine.mr.steps.BuildGlobalHiveDictPartBuildJob.run(BuildGlobalHiveDictPartBuildJob.java:109)
>  at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:155)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>  at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>  at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> ---
> 当把hive.metastore.warehouse.dir参数调整成如下时能绕过去
> ---
> # kylin_hive_conf.xml
> 
>  hive.metastore.warehouse.dir
>  /
>  location of default database for the warehouse
> 
> ---
> 有没有参数能够更改Build Hive Global Dict - parallel part build时MR读取文件的路径？



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KYLIN-4990) 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录

2021-04-26 Thread Xiaoxiang Yu (Jira)



[ 
https://issues.apache.org/jira/browse/KYLIN-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331931#comment-17331931
 ] 

Xiaoxiang Yu commented on KYLIN-4990:
-

Hello [~linlin994395], it is quite complex situation, could you contact me via 
wechat, so we can have a direct discussion. 

> 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录
> 
>
> Key: KYLIN-4990
> URL: https://issues.apache.org/jira/browse/KYLIN-4990
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v3.1.1
>Reporter: xue lin
>Priority: Major
> Attachments: s3-hive-全局字典表.png
>
>
> 我参考了如下文档在涉及到bitmap时构建hive全局字典表
> [http://kylin.apache.org/cn/docs/howto/howto_use_hive_mr_dict.html]
> [https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary]
> https://issues.apache.org/jira/browse/KYLIN-4616
> 理想状况下，希望将表都放在S3上，当今天如下配置时
> ---
> # kylin_hive_conf.xml
> 
>  hive.metastore.warehouse.dir
>  s3://etl-script-product/hive-kylin-dict
>  location of default database for the warehouse
> 
> ---
> S3上表存储情况见附件
> 但当kylin进行到Build Hive Global Dict - parallel part build，报错如下
> ---
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
> not exist: 
> hdfs://ip-10-50-69-202.eu-west-1.compute.internal:8020/kylin_intermediate_cube_fact_remain_dc1531fe_0197_4ab1_a2d5_fe6d6629bb09_distinct_value/dict_column=VIEW_FACT_REMAIN_ID
>  at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
>  at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:271)
>  at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:358)
>  at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303)
>  at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
>  at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
>  at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>  at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>  at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
>  at 
> org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:198)
>  at 
> org.apache.kylin.engine.mr.steps.BuildGlobalHiveDictPartBuildJob.run(BuildGlobalHiveDictPartBuildJob.java:109)
>  at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:155)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>  at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>  at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> ---
> 当把hive.metastore.warehouse.dir参数调整成如下时能绕过去
> ---
> # kylin_hive_conf.xml
> 
>  hive.metastore.warehouse.dir
>  /
>  location of default database for the warehouse
> 
> ---
> 有没有参数能够更改Build Hive Global Dict - parallel part build时MR读取文件的路径？



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (KYLIN-4574) Try auto detect "kylin.source.hive.databasedir" for dict table

2021-04-26 Thread Xiaoxiang Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/KYLIN-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu closed KYLIN-4574.
---
Resolution: Duplicate

> Try auto detect "kylin.source.hive.databasedir" for dict table
> --
>
> Key: KYLIN-4574
> URL: https://issues.apache.org/jira/browse/KYLIN-4574
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Measure - Count Distinct
>Reporter: Xiaoxiang Yu
>Assignee: Xiaoxiang Yu
>Priority: Minor
> Fix For: Future
>
>
> "kylin.source.hive.databasedir" can be auto detected by 
>  - Hive configuration "hive.metastore.warehouse.dir" 
>  - Kylin configuration "kylin.dictionary.mr-hive.database"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KYLIN-4980) Support prunning segments from complex filter conditions

2021-04-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KYLIN-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331841#comment-17331841
 ] 

ASF GitHub Bot commented on KYLIN-4980:
---

zzcclp commented on pull request #1642:
URL: https://github.com/apache/kylin/pull/1642#issuecomment-826353467


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support prunning segments from complex filter conditions
> 
>
> Key: KYLIN-4980
> URL: https://issues.apache.org/jira/browse/KYLIN-4980
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-beta
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> Segment pruner can't prune segment from complex filter conditions, like the 
> filter condition below:
> "where (col_a = xxx and col_partition = xxx) or (col_b=xxx and col_partition 
> =  xxx)" 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KYLIN-4547) parse crc file error when building cube with mapreduce

2021-04-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KYLIN-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331842#comment-17331842
 ] 

ASF GitHub Bot commented on KYLIN-4547:
---

LingangJiang edited a comment on pull request #1245:
URL: https://github.com/apache/kylin/pull/1245#issuecomment-826266565






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parse crc file error when building cube with mapreduce
> --
>
> Key: KYLIN-4547
> URL: https://issues.apache.org/jira/browse/KYLIN-4547
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: v3.0.0
> Environment: hadoop version: CDH-5.12.2-1.cdh5.12.2
>Reporter: steven-qin
>Assignee: steven-qin
>Priority: Major
>
> It can not parse crc file when building cube with mapreduce.
> Here is the exeception log:
> {code:java}
> // code placeholder
> 2020-06-08 10:08:23,821 INFO [main] org.apache.kylin.common.KylinConfig: 
> Creating new manager instance of class org.apache.kylin.cube.CubeManager
> 2020-06-08 10:08:23,844 INFO [main] org.apache.kylin.cube.CubeManager: 
> Initializing CubeManager with config 
> kylin_metadata30@ifile,path=/yarn/nm/usercache/kylin/appcache/application_1590134125851_1782/container_e55_1590134125851_1782_01_03/meta
> 2020-06-08 10:08:23,847 INFO [main] 
> org.apache.kylin.common.persistence.ResourceStore: Using metadata url 
> kylin_metadata30@ifile,path=/yarn/nm/usercache/kylin/appcache/application_1590134125851_1782/container_e55_1590134125851_1782_01_03/meta
>  for resource store
> 2020-06-08 10:08:24,303 ERROR [main] 
> org.apache.kylin.common.persistence.ResourceStore: Error reading resource 
> /cube/.kylin_sales_cube.json.crc
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParseException: 
> Invalid UTF-8 middle byte 0xd2
>  at [Source: (DataInputStream); line: 1, column: 11]
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3543)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeCharForError(UTF8StreamJsonParser.java:3288)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3514)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2621)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:826)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:723)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4129)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3988)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3058)
>   at org.apache.kylin.common.util.JsonUtil.readValue(JsonUtil.java:73)
>   at 
> org.apache.kylin.common.persistence.JsonSerializer.deserialize(JsonSerializer.java:46)
>   at 
> org.apache.kylin.common.persistence.ContentReader.readContent(ContentReader.java:40)
>   at 
> org.apache.kylin.common.persistence.ResourceStore$3.visit(ResourceStore.java:259)
>   at 
> org.apache.kylin.common.persistence.FileResourceStore.visitFolderImpl(FileResourceStore.java:87)
>   at 
> org.apache.kylin.common.persistence.ResourceStore.visitFolderInner(ResourceStore.java:766)
>   at 
> org.apache.kylin.common.persistence.ResourceStore.visitFolderAndContent(ResourceStore.java:751)
>   at 
> org.apache.kylin.common.persistence.ResourceStore.lambda$getAllResourcesMap$0(ResourceStore.java:255)
>   at 
> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>   at 
> org.apache.kylin.common.persistence.ResourceStore.getAllResourcesMap(ResourceStore.java:253)
>   at 
> org.apache.kylin.metadata.cachesync.CachedCrudAssist.reloadAll(CachedCrudAssist.java:127)
>   at org.apache.kylin.cube.CubeManager.(CubeManager.java:152)
>   at

[GitHub] [kylin] LingangJiang edited a comment on pull request #1245: KYLIN-4547 parse crc file error when building cube with mapreduce

2021-04-26 Thread GitBox



LingangJiang edited a comment on pull request #1245:
URL: https://github.com/apache/kylin/pull/1245#issuecomment-826266565






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kylin] zzcclp commented on pull request #1642: KYLIN-4980 Support prunning segments from complex filter co…

2021-04-26 Thread GitBox



zzcclp commented on pull request #1642:
URL: https://github.com/apache/kylin/pull/1642#issuecomment-826353467


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kylin] zzcclp merged pull request #1642: KYLIN-4980 Support prunning segments from complex filter co…

2021-04-26 Thread GitBox



zzcclp merged pull request #1642:
URL: https://github.com/apache/kylin/pull/1642


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (KYLIN-4980) Support prunning segments from complex filter conditions

2021-04-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KYLIN-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331836#comment-17331836
 ] 

ASF GitHub Bot commented on KYLIN-4980:
---

zzcclp merged pull request #1642:
URL: https://github.com/apache/kylin/pull/1642


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support prunning segments from complex filter conditions
> 
>
> Key: KYLIN-4980
> URL: https://issues.apache.org/jira/browse/KYLIN-4980
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-beta
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> Segment pruner can't prune segment from complex filter conditions, like the 
> filter condition below:
> "where (col_a = xxx and col_partition = xxx) or (col_b=xxx and col_partition 
> =  xxx)" 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [kylin] lgtm-com[bot] commented on pull request #1646: fix bug

2021-04-26 Thread GitBox



lgtm-com[bot] commented on pull request #1646:
URL: https://github.com/apache/kylin/pull/1646#issuecomment-826481253


   This pull request **introduces 1 alert** when merging 
140df938c02c7a08766cb837674af8ec1d4b62af into 
1c41baf008ad6444ec4b036f9e62687aacbcfae4 - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/kylin/rev/pr-bba53759148b611b877a4b25c227bac302ed1095)
   
   **new alerts:**
   
   * 1 for Unused format argument


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (KYLIN-4547) parse crc file error when building cube with mapreduce

2021-04-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KYLIN-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331822#comment-17331822
 ] 

ASF GitHub Bot commented on KYLIN-4547:
---

LingangJiang commented on pull request #1245:
URL: https://github.com/apache/kylin/pull/1245#issuecomment-826266565






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> parse crc file error when building cube with mapreduce
> --
>
> Key: KYLIN-4547
> URL: https://issues.apache.org/jira/browse/KYLIN-4547
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: v3.0.0
> Environment: hadoop version: CDH-5.12.2-1.cdh5.12.2
>Reporter: steven-qin
>Assignee: steven-qin
>Priority: Major
>
> It can not parse crc file when building cube with mapreduce.
> Here is the exeception log:
> {code:java}
> // code placeholder
> 2020-06-08 10:08:23,821 INFO [main] org.apache.kylin.common.KylinConfig: 
> Creating new manager instance of class org.apache.kylin.cube.CubeManager
> 2020-06-08 10:08:23,844 INFO [main] org.apache.kylin.cube.CubeManager: 
> Initializing CubeManager with config 
> kylin_metadata30@ifile,path=/yarn/nm/usercache/kylin/appcache/application_1590134125851_1782/container_e55_1590134125851_1782_01_03/meta
> 2020-06-08 10:08:23,847 INFO [main] 
> org.apache.kylin.common.persistence.ResourceStore: Using metadata url 
> kylin_metadata30@ifile,path=/yarn/nm/usercache/kylin/appcache/application_1590134125851_1782/container_e55_1590134125851_1782_01_03/meta
>  for resource store
> 2020-06-08 10:08:24,303 ERROR [main] 
> org.apache.kylin.common.persistence.ResourceStore: Error reading resource 
> /cube/.kylin_sales_cube.json.crc
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParseException: 
> Invalid UTF-8 middle byte 0xd2
>  at [Source: (DataInputStream); line: 1, column: 11]
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3543)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeCharForError(UTF8StreamJsonParser.java:3288)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3514)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2621)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:826)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:723)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4129)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3988)
>   at 
> org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3058)
>   at org.apache.kylin.common.util.JsonUtil.readValue(JsonUtil.java:73)
>   at 
> org.apache.kylin.common.persistence.JsonSerializer.deserialize(JsonSerializer.java:46)
>   at 
> org.apache.kylin.common.persistence.ContentReader.readContent(ContentReader.java:40)
>   at 
> org.apache.kylin.common.persistence.ResourceStore$3.visit(ResourceStore.java:259)
>   at 
> org.apache.kylin.common.persistence.FileResourceStore.visitFolderImpl(FileResourceStore.java:87)
>   at 
> org.apache.kylin.common.persistence.ResourceStore.visitFolderInner(ResourceStore.java:766)
>   at 
> org.apache.kylin.common.persistence.ResourceStore.visitFolderAndContent(ResourceStore.java:751)
>   at 
> org.apache.kylin.common.persistence.ResourceStore.lambda$getAllResourcesMap$0(ResourceStore.java:255)
>   at 
> org.apache.kylin.common.persistence.ExponentialBackoffRetry.doWithRetry(ExponentialBackoffRetry.java:52)
>   at 
> org.apache.kylin.common.persistence.ResourceStore.getAllResourcesMap(ResourceStore.java:253)
>   at 
> org.apache.kylin.metadata.cachesync.CachedCrudAssist.reloadAll(CachedCrudAssist.java:127)
>   at org.apache.kylin.cube.CubeManager.(CubeManager.java:152)
>   at

[GitHub] [kylin] LingangJiang commented on pull request #1245: KYLIN-4547 parse crc file error when building cube with mapreduce

2021-04-26 Thread GitBox



LingangJiang commented on pull request #1245:
URL: https://github.com/apache/kylin/pull/1245#issuecomment-826266565






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (KYLIN-4980) Support prunning segments from complex filter conditions

2021-04-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KYLIN-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331810#comment-17331810
 ] 

ASF GitHub Bot commented on KYLIN-4980:
---

zhengshengjun commented on a change in pull request #1642:
URL: https://github.com/apache/kylin/pull/1642#discussion_r619646558



##
File path: 
kylin-spark-project/kylin-spark-common/src/main/scala/org/apache/spark/sql/execution/datasource/FilePruner.scala
##
@@ -295,8 +295,48 @@ class FilePruner(cubeInstance: CubeInstance,
 }
   }
 
-  private def getSpecFilter(dataFilters: Seq[Expression], col: Attribute): 
Seq[Expression] = {
-dataFilters.filter(_.references.subsetOf(AttributeSet(col)))
+  private def getSegmentFilter(dataFilters: Seq[Expression], col: Attribute): 
Seq[Expression] = {
+dataFilters.map(extractSegmentFilter(_, 
col)).filter(!_.equals(None)).map(_.get)
+  }
+
+  private def extractSegmentFilter(filter: Expression, col: Attribute): 
Option[Expression] = {
+filter match {
+  case expressions.Or(left, right) =>
+val leftChild = extractSegmentFilter(left, col)
+val rightChild = extractSegmentFilter(right, col)
+
+//if there exists leaf-node that doesn't contain partition column, the 
parent filter is
+//unnecessary for segment prunning.
+//e.g. "where a = xxx or partition = xxx", we can't filter any segment
+if (leftChild.eq(None) || rightChild.eq(None)) {
+  None
+} else {
+  Some(expressions.Or(leftChild.get, rightChild.get))
+}
+  case expressions.And(left, right) =>
+val leftChild = extractSegmentFilter(left, col)
+val rightChild = extractSegmentFilter(right, col)
+
+//if there is only one leaf-node that contains partition column
+//e.g. "where a = xxx and partition = xxx",
+//then we can filter segment using "where partition = xxx"
+if (!leftChild.eq(None) && !rightChild.eq(None)) {
+  Some(expressions.And(leftChild.get, rightChild.get))
+} else if (!rightChild.eq(None)) {
+  rightChild
+} else if (!leftChild.eq(None)) {
+  leftChild
+} else {
+  None
+}
+  case _ =>
+//other unary filter like EqualTo, GreaterThan, GreaterThanOrEqual, 
etc.
+if (filter.references.contains(col)) {

Review comment:
   good suggestion ~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support prunning segments from complex filter conditions
> 
>
> Key: KYLIN-4980
> URL: https://issues.apache.org/jira/browse/KYLIN-4980
> Project: Kylin
>  Issue Type: Improvement
>  Components: Query Engine
>Affects Versions: v4.0.0-beta
>Reporter: ShengJun Zheng
>Assignee: ShengJun Zheng
>Priority: Major
> Fix For: v4.0.0-GA
>
>
> Segment pruner can't prune segment from complex filter conditions, like the 
> filter condition below:
> "where (col_a = xxx and col_partition = xxx) or (col_b=xxx and col_partition 
> =  xxx)" 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [kylin] zhengshengjun commented on a change in pull request #1642: KYLIN-4980 Support prunning segments from complex filter co…

2021-04-26 Thread GitBox



zhengshengjun commented on a change in pull request #1642:
URL: https://github.com/apache/kylin/pull/1642#discussion_r619646558



##
File path: 
kylin-spark-project/kylin-spark-common/src/main/scala/org/apache/spark/sql/execution/datasource/FilePruner.scala
##
@@ -295,8 +295,48 @@ class FilePruner(cubeInstance: CubeInstance,
 }
   }
 
-  private def getSpecFilter(dataFilters: Seq[Expression], col: Attribute): 
Seq[Expression] = {
-dataFilters.filter(_.references.subsetOf(AttributeSet(col)))
+  private def getSegmentFilter(dataFilters: Seq[Expression], col: Attribute): 
Seq[Expression] = {
+dataFilters.map(extractSegmentFilter(_, 
col)).filter(!_.equals(None)).map(_.get)
+  }
+
+  private def extractSegmentFilter(filter: Expression, col: Attribute): 
Option[Expression] = {
+filter match {
+  case expressions.Or(left, right) =>
+val leftChild = extractSegmentFilter(left, col)
+val rightChild = extractSegmentFilter(right, col)
+
+//if there exists leaf-node that doesn't contain partition column, the 
parent filter is
+//unnecessary for segment prunning.
+//e.g. "where a = xxx or partition = xxx", we can't filter any segment
+if (leftChild.eq(None) || rightChild.eq(None)) {
+  None
+} else {
+  Some(expressions.Or(leftChild.get, rightChild.get))
+}
+  case expressions.And(left, right) =>
+val leftChild = extractSegmentFilter(left, col)
+val rightChild = extractSegmentFilter(right, col)
+
+//if there is only one leaf-node that contains partition column
+//e.g. "where a = xxx and partition = xxx",
+//then we can filter segment using "where partition = xxx"
+if (!leftChild.eq(None) && !rightChild.eq(None)) {
+  Some(expressions.And(leftChild.get, rightChild.get))
+} else if (!rightChild.eq(None)) {
+  rightChild
+} else if (!leftChild.eq(None)) {
+  leftChild
+} else {
+  None
+}
+  case _ =>
+//other unary filter like EqualTo, GreaterThan, GreaterThanOrEqual, 
etc.
+if (filter.references.contains(col)) {

Review comment:
   good suggestion ~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (KYLIN-4971) Add new measure bitmap_map for count distinct measure in UI

2021-04-26 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/KYLIN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331798#comment-17331798
 ] 

ASF GitHub Bot commented on KYLIN-4971:
---

zhangayqian commented on pull request #1633:
URL: https://github.com/apache/kylin/pull/1633#issuecomment-826212807


   @Ted-Jiang Nice！Can you merge this two commit to one commit? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add new measure bitmap_map for count distinct  measure in UI
> 
>
> Key: KYLIN-4971
> URL: https://issues.apache.org/jira/browse/KYLIN-4971
> Project: Kylin
>  Issue Type: Task
>Reporter: JiangYang
>Assignee: JiangYang
>Priority: Minor
> Fix For: v3.1.3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [kylin] zhangayqian commented on pull request #1633: [KYLIN-4971] Add new measure bitmap_map for count distinct at fronten…

2021-04-26 Thread GitBox



zhangayqian commented on pull request #1633:
URL: https://github.com/apache/kylin/pull/1633#issuecomment-826212807


   @Ted-Jiang Nice！Can you merge this two commit to one commit? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

60 matches

Mail list logo