[jira] [Commented] (KYLIN-5731) When there is a null value in the Kafka source data, the build job reports an error

2023-12-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795598#comment-17795598
 ] 

ASF GitHub Bot commented on KYLIN-5731:
---

thy950523 opened a new pull request, #2161:
URL: https://github.com/apache/kylin/pull/2161

   ## Proposed changes
   Describe the big picture of your changes here to communicate to the 
maintainers why we should accept this pull request. If it fixes a bug or 
resolves a feature request, be sure to link to that issue.
   
   ## Branch to commit
   * [ ]  Branch **kylin3** for v2.x to v3.x
   * [ ]  Branch **kylin4** for v4.x
   * [x]  Branch **kylin5** for v5.x
   
   ## Types of changes
   What types of changes does your code introduce to Kylin? _Put an `x` in the 
boxes that apply_
   
   * [x]  Bugfix (non-breaking change which fixes an issue)
   * [ ]  New feature (non-breaking change which adds functionality)
   * [ ]  Breaking change (fix or feature that would cause existing 
functionality to not work as expected)
   * [ ]  Documentation Update (if none of the other choices apply)
   
   ## Checklist
   _Put an `x` in the boxes that apply. You can also fill these out after 
creating the PR. If you're unsure about any of them, don't hesitate to ask. 
We're here to help! This is simply a reminder of what we are going to look for 
before merging your code._
   
   * [x]  I have created an issue on [Kylin's 
jira](https://issues.apache.org/jira/browse/KYLIN), and have described the 
bug/feature there in detail
   * [x]  Commit messages in my PR start with the related jira ID, like 
"KYLIN- Make Kylin project open-source"
   * [x]  Compiling and unit tests pass locally with my changes
   * [x]  I have added tests that prove my fix is effective or that my feature 
works
   * [x]  I have added necessary documentation (if appropriate)
   * [x]  Any dependent changes have been merged
   
   ## Further comments
   If this is a relatively large or complex change, kick off the discussion at 
[u...@kylin.apache.org](mailto:u...@kylin.apache.org) or 
[d...@kylin.apache.org](mailto:d...@kylin.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   




> When there is a null value in the Kafka source data, the build job reports an 
> error
> ---
>
> Key: KYLIN-5731
> URL: https://issues.apache.org/jira/browse/KYLIN-5731
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: 5.0-beta
>Reporter: zhong.zhu
>Assignee: zhong.zhu
>Priority: Minor
> Fix For: 5.0.0
>
>
> If the field value in kafka json data is null, the task will report an error.
> Null value field "clue_source_2_name":null
> Field type is varchar
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] KYLIN-5731 ~ KYLIN-5747 merge code into kylin5 [kylin]

2023-12-11 Thread via GitHub


thy950523 opened a new pull request, #2161:
URL: https://github.com/apache/kylin/pull/2161

   ## Proposed changes
   Describe the big picture of your changes here to communicate to the 
maintainers why we should accept this pull request. If it fixes a bug or 
resolves a feature request, be sure to link to that issue.
   
   ## Branch to commit
   * [ ]  Branch **kylin3** for v2.x to v3.x
   * [ ]  Branch **kylin4** for v4.x
   * [x]  Branch **kylin5** for v5.x
   
   ## Types of changes
   What types of changes does your code introduce to Kylin? _Put an `x` in the 
boxes that apply_
   
   * [x]  Bugfix (non-breaking change which fixes an issue)
   * [ ]  New feature (non-breaking change which adds functionality)
   * [ ]  Breaking change (fix or feature that would cause existing 
functionality to not work as expected)
   * [ ]  Documentation Update (if none of the other choices apply)
   
   ## Checklist
   _Put an `x` in the boxes that apply. You can also fill these out after 
creating the PR. If you're unsure about any of them, don't hesitate to ask. 
We're here to help! This is simply a reminder of what we are going to look for 
before merging your code._
   
   * [x]  I have created an issue on [Kylin's 
jira](https://issues.apache.org/jira/browse/KYLIN), and have described the 
bug/feature there in detail
   * [x]  Commit messages in my PR start with the related jira ID, like 
"KYLIN- Make Kylin project open-source"
   * [x]  Compiling and unit tests pass locally with my changes
   * [x]  I have added tests that prove my fix is effective or that my feature 
works
   * [x]  I have added necessary documentation (if appropriate)
   * [x]  Any dependent changes have been merged
   
   ## Further comments
   If this is a relatively large or complex change, kick off the discussion at 
[u...@kylin.apache.org](mailto:u...@kylin.apache.org) or 
[d...@kylin.apache.org](mailto:d...@kylin.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@kylin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (KYLIN-5731) When there is a null value in the Kafka source data, the build job reports an error

2023-12-11 Thread zhong.zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhong.zhu updated KYLIN-5731:
-
Description: 
If the field value in kafka json data is null, the task will report an error.
Null value field "clue_source_2_name":null
Field type is varchar
 

  was:
If the field value in kafka json data is null, the task will report an error.

Null value field "clue_source_2_name":null

Field type is varchar
 


> When there is a null value in the Kafka source data, the build job reports an 
> error
> ---
>
> Key: KYLIN-5731
> URL: https://issues.apache.org/jira/browse/KYLIN-5731
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: 5.0-beta
>Reporter: zhong.zhu
>Assignee: zhong.zhu
>Priority: Minor
> Fix For: 5.0.0
>
>
> If the field value in kafka json data is null, the task will report an error.
> Null value field "clue_source_2_name":null
> Field type is varchar
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KYLIN-5743) Set kylin.query.convert-sum-expression-enabled=true, fail to completely hit the aggregate index when the query contains sum (case when) expressions

2023-12-11 Thread zhong.zhu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795303#comment-17795303
 ] 

zhong.zhu commented on KYLIN-5743:
--

h1.Root Cause
With the _*kylin.query.convert-sum-expression-enabled=true*_ conversion switch 
turned on, the original SQL generates an execution plan as follows
{code:shell}
KapOLAPToEnumerableConverter
  KapLimitRel(ctx=[], fetch=[500])
KapAggregateRel(group-set=[[]], groups=[null], EXPR$0=[SUM($0)], ctx=[])
  KapProjectRel($f0=[$1], ctx=[])
KapJoinRel(condition=[=($0, $2)], joinType=[inner], ctx=[])
  KapProjectRel(LO_COMMITDATE=[$15], CASE=[CASE(=($15, 
CAST('20230501'):DATE NOT NULL), $11, null)], ctx=[])
KapTableScan(table=[[SSB, LINEORDER]], ctx=[], fields=[[0, 1, 2, 3, 
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
  KapProjectRel(EXPR$0=[$0], ctx=[])
KapAggregateRel(group-set=[[]], groups=[null], EXPR$0=[MAX($0)], 
ctx=[])
  KapProjectRel(LO_COMMITDATE=[$15], ctx=[])
KapTableScan(table=[[SSB, LINEORDER]], ctx=[], fields=[[0, 1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
{code}
It can be seen that case when is pushed over TableScan (in combination with 
another ProjectMergeRule to the above result), which makes SumExpressionRule 
not work. It is possible that this is unstable, and it is also possible to get 
a different execution plan
{code:shell}
KapOLAPToEnumerableConverter
  KapLimitRel(ctx=[], fetch=[500])
KapAggregateRel(group-set=[[]], groups=[null], AGG$0=[SUM($0)], ctx=[])
  KapProjectRel($f0=[CASE(=($0, CAST('20230501'):DATE NOT NULL), $1, 
null)], ctx=[])
KapAggregateRel(group-set=[[0]], groups=[null], TOP_AGG$0=[SUM($1)], 
TOP_AGG$1=[SUM($2)], ctx=[])
  KapProjectRel(LO_COMMITDATE=[$0], SUM_CASE$0$0=[$1], $f2=[*(0, $2)], 
ctx=[])
KapAggregateRel(group-set=[[0]], groups=[null], 
SUM_CASE$0$0=[SUM($1)], SUM_CONST$1=[COUNT()], ctx=[])
  KapProjectRel(LO_COMMITDATE=[$0], LO_DISCOUNT=[$1], ctx=[])
KapProjectRel(LO_COMMITDATE=[$0], LO_DISCOUNT=[$1], 
LO_ORDERDATE=[$2], ctx=[])
  KapJoinRel(condition=[=($0, $3)], joinType=[inner], ctx=[])
KapProjectRel(LO_COMMITDATE=[$15], LO_DISCOUNT=[$11], 
LO_ORDERDATE=[$5], ctx=[])
  KapTableScan(table=[[SSB, LINEORDER]], ctx=[], 
fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
KapAggregateRel(group-set=[[]], groups=[null], 
EXPR$0=[MAX($0)], ctx=[])
  KapProjectRel(LO_COMMITDATE=[$15], ctx=[])
KapTableScan(table=[[SSB, LINEORDER]], ctx=[], 
fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
{code}

h1.Fix Design
With *_kylin.query.convert-sum-expression-enabled=true_*, then skipping the 
_*KapProjectJoinTransposeRule*_ then ensures that the second execution plan 
above is stabilized, so that it can hit two aggregated indexes instead of one 
aggregated and one detailed.

> Set kylin.query.convert-sum-expression-enabled=true, fail to completely hit 
> the aggregate index when the query contains sum (case when) expressions
> ---
>
> Key: KYLIN-5743
> URL: https://issues.apache.org/jira/browse/KYLIN-5743
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: 5.0-beta
>Reporter: zhong.zhu
>Assignee: zhong.zhu
>Priority: Major
> Fix For: 5.0.0
>
>
> {code:sql}
> select
>   sum(
> case
>   when LO_COMMITDATE = '20230501' then LO_DISCOUNT
> end
>   )
> from
>   (
> select
>   LO_COMMITDATE,
>   LO_DISCOUNT,
>   LINEORDER.LO_ORDERDATE
> from
>   ssb.LINEORDER
>   ) a
> where
>   LO_COMMITDATE = (
> select
>   max(LO_COMMITDATE)
> from
>   ssb.LINEORDER
>   )
> LIMIT
>   500
> {code}
> Fix the sum case when in this scenario so that it hits aggregated indexes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KYLIN-5741) When using the API in project settings API to update the linking relationship between projects and job engines, an error is reported when the projects parameter is empt

2023-12-11 Thread zhong.zhu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795300#comment-17795300
 ] 

zhong.zhu commented on KYLIN-5741:
--

h1.Dev Design
When the projects parameter is empty, the epochs of all projects are updated by 
default.

> When using the API in project settings API to update the linking relationship 
> between projects and job engines, an error is reported when the projects 
> parameter is empty
> -
>
> Key: KYLIN-5741
> URL: https://issues.apache.org/jira/browse/KYLIN-5741
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: 5.0-beta
>Reporter: zhong.zhu
>Assignee: zhong.zhu
>Priority: Critical
> Fix For: 5.0.0
>
> Attachments: image-2023-12-11-14-44-04-923.png
>
>
> !image-2023-12-11-14-44-04-923.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KYLIN-5734) Problems with task scheduling logic

2023-12-11 Thread zhong.zhu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795299#comment-17795299
 ] 

zhong.zhu commented on KYLIN-5734:
--

h1. Fix Design

_*org.apache.kylin.jobs.execution.ExecutableContext#addRunningJob*  In this 
method the logic of adding the current thread 
{*}(runningJobThreads.put(executable.getId(), Thread. 
currentThread()){*}{*}){*}_ independently. The purpose of this is that the 
addRunningJob is just a record of which tasks have been scheduled, which is 
used to determine that they should not be scheduled repeatedly, and should not 
be added to the current thread (the scheduling thread), but rather when the 
task is actually executed.

> Problems with task scheduling logic
> ---
>
> Key: KYLIN-5734
> URL: https://issues.apache.org/jira/browse/KYLIN-5734
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: 5.0-beta
>Reporter: zhong.zhu
>Assignee: zhong.zhu
>Priority: Major
> Fix For: 5.0.0
>
>
> When a task is scheduled, the task is logged into runningJobs and the current 
> thread is logged into runningJobThreads, which is expected to be the thread 
> executing the task, but is actually the thread of the scheduler, which leads 
> to subsequent attempts to interrupt the scheduler FetcherRunner when the task 
> is killed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KYLIN-5733) Export model TDS file in English interface, including Chinese when opening the file in text mode

2023-12-11 Thread zhong.zhu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795297#comment-17795297
 ] 

zhong.zhu commented on KYLIN-5733:
--

h1.Root Cause
When exporting the tds file, the tableau.template.xml file template is used, 
which has "USA" written in the template, so the exported tds file contains 
"USA" in Chinese characters, which has nothing to do with specifying 
Chinese/English in the Kylin interface. It has nothing to do with specifying 
Chinese/English in the Kylin interface.
{code:xml}



{code}

h1.Dev Design
In order to minimize changes and not interfere with previous functionality, 
"USA" -> "US" in the template to meet customer needs

Since this content has been introduced from Kylin3x since 2018, it is no longer 
possible to determine the purpose of defining this label at that time, in the 
test, try to remove the semantic-values label, it can still be displayed 
normally in tableau, try to remove "USA" -> "US" can also be displayed in the 
normal query in tableau can also be displayed in the normal query, so first 
only change the Chinese characters in the template for the time being.


> Export model TDS file in English interface, including Chinese when opening 
> the file in text mode
> 
>
> Key: KYLIN-5733
> URL: https://issues.apache.org/jira/browse/KYLIN-5733
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: 5.0-beta
>Reporter: zhong.zhu
>Assignee: zhong.zhu
>Priority: Critical
> Fix For: 5.0.0
>
>
> *Steps to reproduce the issue:*
>  # Go to a model and click “Export TDS”.
>  # Open up the file in a text editor and look at the bottom. There are 
> Chinese characters. See attachment.
> I confirmed this issue is present in 4.5.4 and .11, and likely exists in 
> other versions. The TDS file seems to work fine though.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KYLIN-5747) Calcite constant folding, adding strings to numbers, results not as expected when multiple plus signs are used together

2023-12-11 Thread zhong.zhu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795292#comment-17795292
 ] 

zhong.zhu commented on KYLIN-5747:
--

h1.Root Cause
When the plus sign is used in a row, and the arguments are all constants, 
calcite will do constant folding, and when calculating the value, it will 
convert the expression into java code, and each plus sign will determine 
whether it needs to be converted into a custom plus function or whether it 
should be used directly with the java additions.

into a custom plus function has three conditions: 1. plus left is not a basic 
type; 2. plus right is a bigDecimal type; 3. (more complex), any one of them 
can be satisfied

For '1' + 3 + '3', '1' + 3 is of type double, and '3' is of type string, so the 
second plus sign does not satisfy all three conditions, which results in the 
expression translated into java code as plus('1' + 3) + '3'.


h1.Dev Design
1. In calcite to generate java code, to determine the + sign needs to be 
converted to plus() it or directly use the addition in java, add a condition, 
when the parameters on both sides of the plus sign for the string type or 
numerical type, directly use plus()
i.e. fix the '1' + 3 + '3' in the previous sql, the java code is plus('1' + 3) 
+ '3'.
After the fix it is plus(plus('1' + 3), '3')
2. Change the return value of the custom plus function in calcite from Double 
to bigDecimal.
The reason:
- When calcite does constant folding to generate java code, it does isNullable 
derivation, and when the nullable of a call is false, it does an automatic 
unboxing.
- The expression 'a' + 3 is considered by calcite to be non-nullable when it 
does the nullable derivation because the two arguments are constants and 
neither of them is null, so the whole thing is considered non-null!
- In spark, 'a' + 3 results in null, so in our implementation of the plus 
method, 'a' + 3 also results in null
- In summary, when plus(string, number) returns Double, the java code for 'a' + 
3 + '3' is actually plus(plus('a' + 3).doubleValue(), '3'), which then throws 
an NPE when the calculation is performed.

The logic of isNullable involves a wider scope, and it is risky to modify it 
directly, so here we change the return value of plus to BigDecimal, to avoid 
unboxing.

> Calcite constant folding, adding strings to numbers, results not as expected 
> when multiple plus signs are used together
> ---
>
> Key: KYLIN-5747
> URL: https://issues.apache.org/jira/browse/KYLIN-5747
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: 5.0-beta
>Reporter: zhong.zhu
>Assignee: zhong.zhu
>Priority: Critical
> Fix For: 5.0.0
>
>
> Phenomenon:
> When more than one plus sign is used in a row and the parameters on both 
> sides of the plus sign are constants, the result is not as expected
> '1' + 3 + 3 → 7 (correct)
> '1' + 3 + '3' → 4.03 (wrong result)
> '1' + '3' + 'a' → error
> When multiple plus signs are used in a row, and the arguments on both sides 
> of the plus sign are constants, and the first plus sign results in null, the 
> use of plus signs in a row is not supported.
> e.g. 'q' + 1 + 1 -> error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KYLIN-5747) Calcite constant folding, adding strings to numbers, results not as expected when multiple plus signs are used together

2023-12-11 Thread zhong.zhu (Jira)
zhong.zhu created KYLIN-5747:


 Summary: Calcite constant folding, adding strings to numbers, 
results not as expected when multiple plus signs are used together
 Key: KYLIN-5747
 URL: https://issues.apache.org/jira/browse/KYLIN-5747
 Project: Kylin
  Issue Type: Bug
Affects Versions: 5.0-beta
Reporter: zhong.zhu
Assignee: zhong.zhu
 Fix For: 5.0.0


Phenomenon:
When more than one plus sign is used in a row and the parameters on both sides 
of the plus sign are constants, the result is not as expected
'1' + 3 + 3 → 7 (correct)
'1' + 3 + '3' → 4.03 (wrong result)
'1' + '3' + 'a' → error

When multiple plus signs are used in a row, and the arguments on both sides of 
the plus sign are constants, and the first plus sign results in null, the use 
of plus signs in a row is not supported.
e.g. 'q' + 1 + 1 -> error






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KYLIN-5746) On the page, select online model operation offline, click the model online again, and put the model online button into ash.

2023-12-11 Thread zhong.zhu (Jira)
zhong.zhu created KYLIN-5746:


 Summary: On the page, select online model operation offline, click 
the model online again, and put the model online button into ash.
 Key: KYLIN-5746
 URL: https://issues.apache.org/jira/browse/KYLIN-5746
 Project: Kylin
  Issue Type: Bug
Affects Versions: 5.0-beta
Reporter: zhong.zhu
Assignee: zhong.zhu
 Fix For: 5.0.0
 Attachments: image-2023-12-11-17-33-31-889.png

Repeat step:
1:Create model, build data, model for online.
2:Operate the model offline
3:Click on the model again and select the model to go online.

Actual result
The model on-line button is grayed out, indicating that the model is not 
available for on-line operation.
 !image-2023-12-11-17-33-31-889.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KYLIN-5745) The historical garbage cleanup task was not completed, causing the subsequent scheduled garbage cleanup task cannot be executed normally

2023-12-11 Thread zhong.zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhong.zhu updated KYLIN-5745:
-
Description: 
{*}Problem description{*}: 
Timed garbage cleanup operation cannot be completed successfully


{*}Background{*}: 
The customer found that Kylin has a large number of small files occupying hdfs 
storage, we need to clean up, we check the customer's environment and found 
that the timed garbage cleanup has not been completed properly, has been 
timeout!


*Troubleshooting:*
After the check, it is found that the customer's garbage clearing is triggered 
for the first time in the morning of 4.6 after Kylin is restarted on the night 
of 4.5. After this clearing operation is triggered, the thread of query history 
has been deleted since then. As a result, subsequent periodic garbage clearing 
tasks cannot be completed

Delete 2,000 rows of data at a time, one of the customer's projects need to 
delete 550,000 query history, look at the kylin.log record, delete 
time-consuming because of table locking problems lead to a delete operation 
even reached more than 20 minutes!

The following record is that the main thread of garbage collection is waiting 
for the query history cleaning to complete, but the query history cleaning has 
not been completed, and then the main thread timeout and exit.


{code:shell}
2023-04-06T00:00:00,015 INFO  [RoutineOpsWorker-287] service.ScheduleService : 
execute task MetadataBackup with remaining time: 1435 ms
2023-04-06T00:01:52,649 INFO  [RoutineOpsWorker-287] service.ScheduleService : 
execute task QueryHistoriesCleanup with remaining time: 14287361 ms
...
2023-04-06T04:00:00,012 WARN  [DefaultTaskScheduler-3] service.ScheduleService 
: Routine task execution timeout
java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.get(FutureTask.java:205) 
~[?:1.8.0_242]
at 
org.apache.kylin.rest.service.ScheduleService.executeTask(ScheduleService.java:107)
 ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
at 
org.apache.kylin.rest.service.ScheduleService.routineTask(ScheduleService.java:77)
 ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
at 
org.apache.kylin.rest.service.ScheduleService$$FastClassBySpringCGLIB$$afbfc46c.invoke()
 ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
{code}

The following record is until the latest time provided by the log, after 9:00 
pm the query history is still processing deletion, not with the termination of 
the main thread
{code:shell}
2023-04-06T00:08:43,015 DEBUG [QueryHistoryCleanWorker-23145] 
QueryHistoryMapper.selectByProject : <==  Total: 12
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,017 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Start to delete query histories that are beyond max 
size for project, records:1551669
...
2023-04-06T09:03:54,974 INFO  [QueryHistoryCleanWorker-23145] 
query.JdbcQueryHistoryStore : Delete 2000 row query history for project [CXCZH] 
takes 938060 ms
2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145] 
QueryHistoryMapper.delete : ==>  Preparing: delete from 
ke4_instance_query_history_realization where query_time < ? and project_name = ?
2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145] 
QueryHistoryMapper.delete : ==> Parameters: 1678863450091(Long), CXCZH(String)
{code}


 

  was:
{*}Problem description{*}: 
Timed garbage cleanup operation cannot be completed successfully


{*}Background{*}: 
The customer found that Kylin has a large number of small files occupying hdfs 
storage, we need to clean up, we check the customer's environment and found 
that the timed garbage cleanup has not been completed properly, has been 
timeout!


*Troubleshooting:*
After the check, it is found that the customer's garbage clearing is triggered 
for the first time in the morning of 4.6 after KE is restarted on the night of 
4.5. After this clearing operation is triggered, the thread of query history 
has been deleted since then. As a result, subsequent periodic garbage clearing 
tasks cannot be completed

Delete 2,000 rows of data at a time, one of the customer's projects need to 
delete 550,000 query history, look at the kylin.log record, delete 
time-consuming because of table locking problems lead to a delete operation 
even reached 

[jira] [Commented] (KYLIN-5745) The historical garbage cleanup task was not completed, causing the subsequent scheduled garbage cleanup task cannot be executed normally

2023-12-11 Thread zhong.zhu (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795247#comment-17795247
 ] 

zhong.zhu commented on KYLIN-5745:
--

h1. Root Cause

Controllers in Spring are singleton model, so each call to the following method 
Service, will be serial cleanup of the underlying HDFS files, compared to the 
cleanup of metadata and query histories, this process is particularly 
time-consuming; at the same time, *_MetadataToolHelper_* is serial design, so 
other calls to this method will also cause serialization problems.
{code:java}
MetadataToolHelper::cleanStorage{code}
Another programming issue, performing various types of data cleanup, Kylin uses 
two types of thread pools. One is the Route service's single-threaded pool; the 
other is Spring's built-in task pool, the default 5 threads, and the tasks they 
accept will involve cleaning up the function of HDFS files, so it will lead to 
the whole Kylin into the unavailability, so you need to use a separate thread 
pool to perform the function of cleaning up the HDFS files.
h1. Dev Degign

Transforms the logic for cleaning up HDFS files into a thread pooling pattern, 
while providing timeout logic.

A thread pool based on *PriorityBlockingQueue's* priority task queue to build a 
task manager for managing storage garbage cleanup that decouples the steps of 
serial cleanup of Class I, II, and III garbage in an asynchronous manner.
h1. Major Changes

1. Define the task type:{*}SERVICE > CLI > ROUTINE{*}
2. Task abstract class with weights:
{code:java}
public abstract class AbstractComparableCleanTask implements Runnable 
Comparable{code}
3. Custom thread pools and task cache queues: configurable 
{_}*CachedThreadPool*{_}, specified _*PriorityBlockingQueue*_
4. Added Task Manager:
{code:java}
public class CleanStoragesHelper implements Closeable{code}
5. New timeout mechanism: based on *_JAVA CompletableFuture / Future _*to 
complete the core logic, the new parameters are as follows
{*}_kylin.storage.clean-timeout=1h_{*}: specify the cleanup task, default 
timeout, only for query histories/storage cleanup, {color:#de350b}this 
parameter will only take effect in non-Routine scenarios, non-CLI 
scenarios;{color}
{*}_kylin.storage.clean-tasks-concurrency=5_{*}:Specifies the number of threads 
storing garbage Query histories/HDFS tasks, i.e., up to how many of these two 
types of tasks are executed at the same time, with subsequent commits waiting 
in the task cache queue.
6. Asynchronous/synchronous mechanism:{_}*CleanStoragesHelper*{_} provides 
synchronous/asynchronous methods, which are chosen by the upper level according 
to the scenario.

7. Initialization timing for global classes:Initialize the tool class 
*_CleanStoragesHelper_* in *_AppInitializer_* to avoid problems due to the 
method *_KylinConfig.getInstanceFromEn_* that may return non-system KylinConfig.
8. Track the life cycle of a task:{*}CREATE => SUBMIT => SUCCEED/FAILED{*}

> The historical garbage cleanup task was not completed, causing the subsequent 
> scheduled garbage cleanup task cannot be executed normally
> 
>
> Key: KYLIN-5745
> URL: https://issues.apache.org/jira/browse/KYLIN-5745
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: 5.0-beta
>Reporter: zhong.zhu
>Assignee: zhong.zhu
>Priority: Major
> Fix For: 5.0.0
>
>
> {*}Problem description{*}: 
> Timed garbage cleanup operation cannot be completed successfully
> {*}Background{*}: 
> The customer found that Kylin has a large number of small files occupying 
> hdfs storage, we need to clean up, we check the customer's environment and 
> found that the timed garbage cleanup has not been completed properly, has 
> been timeout!
> *Troubleshooting:*
> After the check, it is found that the customer's garbage clearing is 
> triggered for the first time in the morning of 4.6 after KE is restarted on 
> the night of 4.5. After this clearing operation is triggered, the thread of 
> query history has been deleted since then. As a result, subsequent periodic 
> garbage clearing tasks cannot be completed
> Delete 2,000 rows of data at a time, one of the customer's projects need to 
> delete 550,000 query history, look at the kylin.log record, delete 
> time-consuming because of table locking problems lead to a delete operation 
> even reached more than 20 minutes!
> The following record is that the main thread of garbage collection is waiting 
> for the query history cleaning to complete, but the query history cleaning 
> has not been completed, and then the main thread timeout and exit.
> {code:shell}
> 2023-04-06T00:00:00,015 INFO  [RoutineOpsWorker-287] service.ScheduleService 
> : execute task 

[jira] [Created] (KYLIN-5745) The historical garbage cleanup task was not completed, causing the subsequent scheduled garbage cleanup task cannot be executed normally

2023-12-11 Thread zhong.zhu (Jira)
zhong.zhu created KYLIN-5745:


 Summary: The historical garbage cleanup task was not completed, 
causing the subsequent scheduled garbage cleanup task cannot be executed 
normally
 Key: KYLIN-5745
 URL: https://issues.apache.org/jira/browse/KYLIN-5745
 Project: Kylin
  Issue Type: Bug
Affects Versions: 5.0-beta
Reporter: zhong.zhu
Assignee: zhong.zhu
 Fix For: 5.0.0


{*}Problem description{*}: 
Timed garbage cleanup operation cannot be completed successfully


{*}Background{*}: 
The customer found that Kylin has a large number of small files occupying hdfs 
storage, we need to clean up, we check the customer's environment and found 
that the timed garbage cleanup has not been completed properly, has been 
timeout!


*Troubleshooting:*
After the check, it is found that the customer's garbage clearing is triggered 
for the first time in the morning of 4.6 after KE is restarted on the night of 
4.5. After this clearing operation is triggered, the thread of query history 
has been deleted since then. As a result, subsequent periodic garbage clearing 
tasks cannot be completed

Delete 2,000 rows of data at a time, one of the customer's projects need to 
delete 550,000 query history, look at the kylin.log record, delete 
time-consuming because of table locking problems lead to a delete operation 
even reached more than 20 minutes!

The following record is that the main thread of garbage collection is waiting 
for the query history cleaning to complete, but the query history cleaning has 
not been completed, and then the main thread timeout and exit.


{code:shell}
2023-04-06T00:00:00,015 INFO  [RoutineOpsWorker-287] service.ScheduleService : 
execute task MetadataBackup with remaining time: 1435 ms
2023-04-06T00:01:52,649 INFO  [RoutineOpsWorker-287] service.ScheduleService : 
execute task QueryHistoriesCleanup with remaining time: 14287361 ms
...
2023-04-06T04:00:00,012 WARN  [DefaultTaskScheduler-3] service.ScheduleService 
: Routine task execution timeout
java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.get(FutureTask.java:205) 
~[?:1.8.0_242]
at 
org.apache.kylin.rest.service.ScheduleService.executeTask(ScheduleService.java:107)
 ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
at 
org.apache.kylin.rest.service.ScheduleService.routineTask(ScheduleService.java:77)
 ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
at 
org.apache.kylin.rest.service.ScheduleService$$FastClassBySpringCGLIB$$afbfc46c.invoke()
 ~[kylin-job-service-5.0.0-ke-4.6.2.0.jar:?]
{code}

The following record is until the latest time provided by the log, after 9:00 
pm the query history is still processing deletion, not with the termination of 
the main thread
{code:shell}
2023-04-06T00:08:43,015 DEBUG [QueryHistoryCleanWorker-23145] 
QueryHistoryMapper.selectByProject : <==  Total: 12
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,016 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Query histories of project is less than the 
maximum limit, so skip it.
2023-04-06T00:08:43,017 INFO  [QueryHistoryCleanWorker-23145] 
util.QueryHisStoreUtil : Start to delete query histories that are beyond max 
size for project, records:1551669
...
2023-04-06T09:03:54,974 INFO  [QueryHistoryCleanWorker-23145] 
query.JdbcQueryHistoryStore : Delete 2000 row query history for project [CXCZH] 
takes 938060 ms
2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145] 
QueryHistoryMapper.delete : ==>  Preparing: delete from 
ke4_instance_query_history_realization where query_time < ? and project_name = ?
2023-04-06T09:03:54,975 DEBUG [QueryHistoryCleanWorker-23145] 
QueryHistoryMapper.delete : ==> Parameters: 1678863450091(Long), CXCZH(String)
{code}


 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)