[jira] [Commented] (KYLIN-5767) Calculating total rows abnormal when jdbc datasource is connnected

2024-03-31 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832724#comment-17832724
 ] 

ASF subversion and git services commented on KYLIN-5767:


Commit d3b4469484a79102071db26114f6602b30d70f43 in kylin's branch 
refs/heads/kylin5 from Liang.Hua
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=d3b4469484 ]

KYLIN-5767 Calculate total rows wrongly when connecting jdbc datasource

-
Co-authored-by: liang.hua 


> Calculating total rows abnormal when jdbc datasource is connnected
> --
>
> Key: KYLIN-5767
> URL: https://issues.apache.org/jira/browse/KYLIN-5767
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: 5.0-beta
>Reporter: pengfei.zhan
>Assignee: pengfei.zhan
>Priority: Major
> Fix For: 5.0.0
>
>
> {{When the JDBC data source is connected, the snapshot management function is 
> enabled and the dimension table is not sampled, optimize the build logic to 
> ensure that the job can be executed normally when the dimension table data 
> volume is large}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KYLIN-5767) Calculating total rows abnormal when jdbc datasource is connnected

2024-03-29 Thread pengfei.zhan (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832357#comment-17832357
 ] 

pengfei.zhan commented on KYLIN-5767:
-

h1. Design


add the method `getCountData` to  
org.apache.kylin.source.jdbc.ISourceConnector, put the execution of sql `select 
count(*) from table` to the datasource.

> Calculating total rows abnormal when jdbc datasource is connnected
> --
>
> Key: KYLIN-5767
> URL: https://issues.apache.org/jira/browse/KYLIN-5767
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: 5.0-beta
>Reporter: pengfei.zhan
>Assignee: pengfei.zhan
>Priority: Major
> Fix For: 5.0.0
>
>
> {{When the JDBC data source is connected, the snapshot management function is 
> enabled and the dimension table is not sampled, optimize the build logic to 
> ensure that the job can be executed normally when the dimension table data 
> volume is large}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KYLIN-5767) Calculating total rows abnormal when jdbc datasource is connnected

2024-03-29 Thread pengfei.zhan (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832356#comment-17832356
 ] 

pengfei.zhan commented on KYLIN-5767:
-

h1. Problem

With Snapshot Management turned on, the Model Snapshot Build task did not skip 
the Build Snapshot step. KYLIN will get the table sampling information 
according to "tableManager.getTableExtIfExists(tableDesc)". If the tables's 
sampling information is empty or the number of rows sampled is equal to 0, it 
will calculate the total rows. This step will be queried at the JDBC data 
source level by the way of "select *". However, if table from the customer 
environment is too large, the build stage in this step will spent too much 
time. Usually,  the large dimension data and the partition column is null may 
lead to this situation.

If the sampling information is empty or the number of rows sampled is equal to 
0, then the total rows will be calculated. This step will be queried at the 
JDBC data source level through the "select *" method.

> Calculating total rows abnormal when jdbc datasource is connnected
> --
>
> Key: KYLIN-5767
> URL: https://issues.apache.org/jira/browse/KYLIN-5767
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: 5.0-beta
>Reporter: pengfei.zhan
>Assignee: pengfei.zhan
>Priority: Major
> Fix For: 5.0.0
>
>
> {{When the JDBC data source is connected, the snapshot management function is 
> enabled and the dimension table is not sampled, optimize the build logic to 
> ensure that the job can be executed normally when the dimension table data 
> volume is large}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)