[jira] [Commented] (KYLIN-5268) kylin4.0.1服务每天都会内存溢出java.lang.OutOfMemoryError: Java heap space

2022-10-01 Thread liyang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17612010#comment-17612010
 ] 

liyang commented on KYLIN-5268:
---

> 如果tableau发出的奇怪sql不多固定,似乎修改源码的 org.apache.kylin.rest.util.TableauInterceptor 
> 类更合理

+1

> kylin4.0.1服务每天都会内存溢出java.lang.OutOfMemoryError: Java heap space
> ---
>
> Key: KYLIN-5268
> URL: https://issues.apache.org/jira/browse/KYLIN-5268
> Project: Kylin
>  Issue Type: Bug
>Reporter: zhenkuan_zhang
>Priority: Major
> Attachments: kylin_query.log
>
>
> Hi,最近发现我们的kylin服务非常不稳定,三个节点,每天都会有节点出现内存溢出的状况,jvm从16G调到48G也同样会内存溢出。通过分析Dump 出 
> java_pid34184.hprof文件,发现内存会被两个查询完全占满导致jvm内存溢出,sql是通过tableau发送来的。sql如下(该事实表行数上亿):
> 1.
> SELECT "自訂 SQL 查詢"."FAIL" AS "FAIL" FROM (   
>   SELECT DT  TEST_DATE,TEST_STATION_CODE,SUBSTRING(TEST_LINE,1,6) 
> TEST_FLOOR,TEST_LINE,CONFIG,REGION,LOCALLIZATION,
>   SUBSTRING(ATL FROM  1 for POSITION('*-' IN ATL from 1)-1 ) 
> VENDERATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 5)+1 FOR 3) 
> WEEKATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 6)+1 FOR 4) 
> DATEATL,SUBSTRING(ATL FROM POSITION('*-' IN ATL FROM 1)+7 FOR 4) ATL ,
>   INPUT,FAIL,RETEST FROM BI_DW.KPY_FACT_SUM_DATA
>   ) "自訂 SQL 查詢"
> 2.
> SELECT "自訂 SQL 查詢"."RETEST" AS "RETEST" FROM (   
>   SELECT DT  TEST_DATE,TEST_STATION_CODE,SUBSTRING(TEST_LINE,1,6) 
> TEST_FLOOR,TEST_LINE,CONFIG,REGION,LOCALLIZATION,
>   SUBSTRING(ATL FROM  1 for POSITION('*-' IN ATL from 1)-1 ) 
> VENDERATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 5)+1 FOR 3) 
> WEEKATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 6)+1 FOR 4) 
> DATEATL,SUBSTRING(ATL FROM POSITION('*-' IN ATL FROM 1)+7 FOR 4) ATL ,
>   INPUT,FAIL,RETEST FROM BI_DW.KPY_FACT_SUM_DATA
>   ) "自訂 SQL 查詢"
> 一方面我们在与tableau方联系,沟通为什么会发出这么奇怪的sql。
> 另一方面希望得到kylin的支持,看看有没有什么解决方法,
> 比如kylin.query.force-limit=1000 可以为 select * from table 加上 limit 1000,有没有办法限制 
> select a from table。
> 或者有没有其他的解决方法。
> 期待回复,十分感谢!!!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KYLIN-5268) kylin4.0.1服务每天都会内存溢出java.lang.OutOfMemoryError: Java heap space

2022-09-20 Thread Liu Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607490#comment-17607490
 ] 

Liu Zhao commented on KYLIN-5268:
-

hi,我刚刚看了下,如果tableau发出的奇怪sql不多固定,似乎修改源码的 
org.apache.kylin.rest.util.TableauInterceptor 类更合理,相较于修改 QueryUtil.java 
强制为没有limit的sql加limit,前者可以直接拦截不提交spark查询作业,后者还是会发起spark查询作业有资源消耗。


{code:java}
// QueryService.java
conn = QueryConnection.getConnection(sqlRequest.getProject());
String userInfo = 
SecurityContextHolder.getContext().getAuthentication().getName();
QueryContext context = QueryContextFacade.current();
context.setUsername(userInfo);
context.setGroups(AclPermissionUtil.getCurrentUserGroups());
context.setProject(sqlRequest.getProject());
final Collection grantedAuthorities = 
SecurityContextHolder.getContext()
.getAuthentication().getAuthorities();
for (GrantedAuthority grantedAuthority : grantedAuthorities) {
userInfo += ",";
userInfo += grantedAuthority.getAuthority();
}

SQLResponse fakeResponse = 
TableauInterceptor.tableauIntercept(sqlRequest.getSql());
if (null != fakeResponse) {
logger.debug("Return fake response, is exception? " + 
fakeResponse.getIsException());
return fakeResponse;
}
String correctedSql = QueryUtil.massageSql(sqlRequest.getSql(), 
sqlRequest.getProject(),
sqlRequest.getLimit(), sqlRequest.getOffset(), 
conn.getSchema(), Constant.FakeCatalogName);
if (!correctedSql.equals(sqlRequest.getSql())) {
logger.info("The corrected query: " + correctedSql);

//CAUTION: should not change sqlRequest content!
//sqlRequest.setSql(correctedSql);
}
{code}

 

> kylin4.0.1服务每天都会内存溢出java.lang.OutOfMemoryError: Java heap space
> ---
>
> Key: KYLIN-5268
> URL: https://issues.apache.org/jira/browse/KYLIN-5268
> Project: Kylin
>  Issue Type: Bug
>Reporter: zhenkuan_zhang
>Priority: Major
> Attachments: kylin_query.log
>
>
> Hi,最近发现我们的kylin服务非常不稳定,三个节点,每天都会有节点出现内存溢出的状况,jvm从16G调到48G也同样会内存溢出。通过分析Dump 出 
> java_pid34184.hprof文件,发现内存会被两个查询完全占满导致jvm内存溢出,sql是通过tableau发送来的。sql如下(该事实表行数上亿):
> 1.
> SELECT "自訂 SQL 查詢"."FAIL" AS "FAIL" FROM (   
>   SELECT DT  TEST_DATE,TEST_STATION_CODE,SUBSTRING(TEST_LINE,1,6) 
> TEST_FLOOR,TEST_LINE,CONFIG,REGION,LOCALLIZATION,
>   SUBSTRING(ATL FROM  1 for POSITION('*-' IN ATL from 1)-1 ) 
> VENDERATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 5)+1 FOR 3) 
> WEEKATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 6)+1 FOR 4) 
> DATEATL,SUBSTRING(ATL FROM POSITION('*-' IN ATL FROM 1)+7 FOR 4) ATL ,
>   INPUT,FAIL,RETEST FROM BI_DW.KPY_FACT_SUM_DATA
>   ) "自訂 SQL 查詢"
> 2.
> SELECT "自訂 SQL 查詢"."RETEST" AS "RETEST" FROM (   
>   SELECT DT  TEST_DATE,TEST_STATION_CODE,SUBSTRING(TEST_LINE,1,6) 
> TEST_FLOOR,TEST_LINE,CONFIG,REGION,LOCALLIZATION,
>   SUBSTRING(ATL FROM  1 for POSITION('*-' IN ATL from 1)-1 ) 
> VENDERATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 5)+1 FOR 3) 
> WEEKATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 6)+1 FOR 4) 
> DATEATL,SUBSTRING(ATL FROM POSITION('*-' IN ATL FROM 1)+7 FOR 4) ATL ,
>   INPUT,FAIL,RETEST FROM BI_DW.KPY_FACT_SUM_DATA
>   ) "自訂 SQL 查詢"
> 一方面我们在与tableau方联系,沟通为什么会发出这么奇怪的sql。
> 另一方面希望得到kylin的支持,看看有没有什么解决方法,
> 比如kylin.query.force-limit=1000 可以为 select * from table 加上 limit 1000,有没有办法限制 
> select a from table。
> 或者有没有其他的解决方法。
> 期待回复,十分感谢!!!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KYLIN-5268) kylin4.0.1服务每天都会内存溢出java.lang.OutOfMemoryError: Java heap space

2022-09-20 Thread zhenkuan_zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607475#comment-17607475
 ] 

zhenkuan_zhang commented on KYLIN-5268:
---

非常感谢您的回复。
根据您提供的方案,我们目前在跟tableau方沟通,
并设置 spark.driver.maxResultSize =500m,kylin.query.timeout-seconds=180,
关于kylin.query.force-limit 我们设置成了 5, 
并修改了源码query/src/main/java/org/apache/kylin/query/util/QueryUtil.java,把判断sql是否包含*的判断去掉了,自动为所有sql加limit。
今天没有出现OOM的状况,我们会再持续观察几天,如果有更好的方案,希望可以再通知我们,谢谢!

> kylin4.0.1服务每天都会内存溢出java.lang.OutOfMemoryError: Java heap space
> ---
>
> Key: KYLIN-5268
> URL: https://issues.apache.org/jira/browse/KYLIN-5268
> Project: Kylin
>  Issue Type: Bug
>Reporter: zhenkuan_zhang
>Priority: Major
> Attachments: kylin_query.log
>
>
> Hi,最近发现我们的kylin服务非常不稳定,三个节点,每天都会有节点出现内存溢出的状况,jvm从16G调到48G也同样会内存溢出。通过分析Dump 出 
> java_pid34184.hprof文件,发现内存会被两个查询完全占满导致jvm内存溢出,sql是通过tableau发送来的。sql如下(该事实表行数上亿):
> 1.
> SELECT "自訂 SQL 查詢"."FAIL" AS "FAIL" FROM (   
>   SELECT DT  TEST_DATE,TEST_STATION_CODE,SUBSTRING(TEST_LINE,1,6) 
> TEST_FLOOR,TEST_LINE,CONFIG,REGION,LOCALLIZATION,
>   SUBSTRING(ATL FROM  1 for POSITION('*-' IN ATL from 1)-1 ) 
> VENDERATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 5)+1 FOR 3) 
> WEEKATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 6)+1 FOR 4) 
> DATEATL,SUBSTRING(ATL FROM POSITION('*-' IN ATL FROM 1)+7 FOR 4) ATL ,
>   INPUT,FAIL,RETEST FROM BI_DW.KPY_FACT_SUM_DATA
>   ) "自訂 SQL 查詢"
> 2.
> SELECT "自訂 SQL 查詢"."RETEST" AS "RETEST" FROM (   
>   SELECT DT  TEST_DATE,TEST_STATION_CODE,SUBSTRING(TEST_LINE,1,6) 
> TEST_FLOOR,TEST_LINE,CONFIG,REGION,LOCALLIZATION,
>   SUBSTRING(ATL FROM  1 for POSITION('*-' IN ATL from 1)-1 ) 
> VENDERATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 5)+1 FOR 3) 
> WEEKATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 6)+1 FOR 4) 
> DATEATL,SUBSTRING(ATL FROM POSITION('*-' IN ATL FROM 1)+7 FOR 4) ATL ,
>   INPUT,FAIL,RETEST FROM BI_DW.KPY_FACT_SUM_DATA
>   ) "自訂 SQL 查詢"
> 一方面我们在与tableau方联系,沟通为什么会发出这么奇怪的sql。
> 另一方面希望得到kylin的支持,看看有没有什么解决方法,
> 比如kylin.query.force-limit=1000 可以为 select * from table 加上 limit 1000,有没有办法限制 
> select a from table。
> 或者有没有其他的解决方法。
> 期待回复,十分感谢!!!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KYLIN-5268) kylin4.0.1服务每天都会内存溢出java.lang.OutOfMemoryError: Java heap space

2022-09-20 Thread Liu Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607283#comment-17607283
 ] 

Liu Zhao commented on KYLIN-5268:
-

1. 从tableau角度解决根因
2. 可以从github上拿源码自行编译,对tableau奇怪的sql拦截或者使kylin.query.force-limit对所有sql生效
3. 是否可以在 tableau kylin之间增加一层代理拦截异常sql

> kylin4.0.1服务每天都会内存溢出java.lang.OutOfMemoryError: Java heap space
> ---
>
> Key: KYLIN-5268
> URL: https://issues.apache.org/jira/browse/KYLIN-5268
> Project: Kylin
>  Issue Type: Bug
>Reporter: zhenkuan_zhang
>Priority: Major
> Attachments: kylin_query.log
>
>
> Hi,最近发现我们的kylin服务非常不稳定,三个节点,每天都会有节点出现内存溢出的状况,jvm从16G调到48G也同样会内存溢出。通过分析Dump 出 
> java_pid34184.hprof文件,发现内存会被两个查询完全占满导致jvm内存溢出,sql是通过tableau发送来的。sql如下(该事实表行数上亿):
> 1.
> SELECT "自訂 SQL 查詢"."FAIL" AS "FAIL" FROM (   
>   SELECT DT  TEST_DATE,TEST_STATION_CODE,SUBSTRING(TEST_LINE,1,6) 
> TEST_FLOOR,TEST_LINE,CONFIG,REGION,LOCALLIZATION,
>   SUBSTRING(ATL FROM  1 for POSITION('*-' IN ATL from 1)-1 ) 
> VENDERATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 5)+1 FOR 3) 
> WEEKATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 6)+1 FOR 4) 
> DATEATL,SUBSTRING(ATL FROM POSITION('*-' IN ATL FROM 1)+7 FOR 4) ATL ,
>   INPUT,FAIL,RETEST FROM BI_DW.KPY_FACT_SUM_DATA
>   ) "自訂 SQL 查詢"
> 2.
> SELECT "自訂 SQL 查詢"."RETEST" AS "RETEST" FROM (   
>   SELECT DT  TEST_DATE,TEST_STATION_CODE,SUBSTRING(TEST_LINE,1,6) 
> TEST_FLOOR,TEST_LINE,CONFIG,REGION,LOCALLIZATION,
>   SUBSTRING(ATL FROM  1 for POSITION('*-' IN ATL from 1)-1 ) 
> VENDERATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 5)+1 FOR 3) 
> WEEKATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 6)+1 FOR 4) 
> DATEATL,SUBSTRING(ATL FROM POSITION('*-' IN ATL FROM 1)+7 FOR 4) ATL ,
>   INPUT,FAIL,RETEST FROM BI_DW.KPY_FACT_SUM_DATA
>   ) "自訂 SQL 查詢"
> 一方面我们在与tableau方联系,沟通为什么会发出这么奇怪的sql。
> 另一方面希望得到kylin的支持,看看有没有什么解决方法,
> 比如kylin.query.force-limit=1000 可以为 select * from table 加上 limit 1000,有没有办法限制 
> select a from table。
> 或者有没有其他的解决方法。
> 期待回复,十分感谢!!!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KYLIN-5268) kylin4.0.1服务每天都会内存溢出java.lang.OutOfMemoryError: Java heap space

2022-09-19 Thread zhenkuan_zhang (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606831#comment-17606831
 ] 

zhenkuan_zhang commented on KYLIN-5268:
---

2022-09-15 21:46:54,588 INFO  [BadQueryDetector] service.BadQueryDetector:204 : 
Slow query has been running 1328.617 seconds (project:KPY_RANGER, thread: 0x42, 
user:h26, query id:fb8d154f-726c-3f74-bfba-486b484952a9) -- SELECT "自訂 SQL 
查詢"."FAIL" AS "FAIL" FROM ()"自訂 SQL 查詢"
2022-09-15 21:46:54,647 DEBUG [BadQueryDetector] 
badquery.BadQueryHistoryManager:65 : Loaded 50 Bad Query(s)
2022-09-15 21:46:54,690 INFO  [localhost-startStop-1-EventThread] 
state.ConnectionStateManager:228 : State change: LOST
2022-09-15 21:46:54,692 ERROR [localhost-startStop-1-EventThread] 
imps.CuratorFrameworkImpl:557 : Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:516)
at 
org.apache.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:53)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2022-09-15 21:46:54,707 INFO  [BadQueryDetector] service.BadQueryDetector:192 : 
Problematic thread 0x42 Query fb8d154f-726c-3f74-bfba-486b484952a9-66, query 
id: fb8d154f-726c-3f74-bfba-486b484952a9
at 
org.apache.spark.sql.execution.SparkPlan$$anon$1.next(SparkPlan.scala:284)
at 
org.apache.spark.sql.execution.SparkPlan$$anon$1.next(SparkPlan.scala:278)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at 
org.apache.spark.sql.execution.SparkPlan$$anon$1.foreach(SparkPlan.scala:278)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeCollect$1.apply(SparkPlan.scala:300)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeCollect$1.apply(SparkPlan.scala:299)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:299)
at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3389)

2022-09-15 21:46:54,708 INFO  [BadQueryDetector] service.BadQueryDetector:204 : 
Slow query has been running 1328.68 seconds 

> kylin4.0.1服务每天都会内存溢出java.lang.OutOfMemoryError: Java heap space
> ---
>
> Key: KYLIN-5268
> URL: https://issues.apache.org/jira/browse/KYLIN-5268
> Project: Kylin
>  Issue Type: Bug
>Reporter: zhenkuan_zhang
>Priority: Major
> Attachments: kylin_query.log
>
>
> Hi,最近发现我们的kylin服务非常不稳定,三个节点,每天都会有节点出现内存溢出的状况,jvm从16G调到48G也同样会内存溢出。通过分析Dump 出 
> java_pid34184.hprof文件,发现内存会被两个查询完全占满导致jvm内存溢出,sql是通过tableau发送来的。sql如下(该事实表行数上亿):
> 1.
> SELECT "自訂 SQL 查詢"."FAIL" AS "FAIL" FROM (   
>   SELECT DT  TEST_DATE,TEST_STATION_CODE,SUBSTRING(TEST_LINE,1,6) 
> TEST_FLOOR,TEST_LINE,CONFIG,REGION,LOCALLIZATION,
>   SUBSTRING(ATL FROM  1 for POSITION('*-' IN ATL from 1)-1 ) 
> VENDERATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 5)+1 FOR 3) 
> WEEKATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 6)+1 FOR 4) 
> DATEATL,SUBSTRING(ATL FROM POSITION('*-' IN ATL FROM 1)+7 FOR 4) ATL ,
>   INPUT,FAIL,RETEST FROM BI_DW.KPY_FACT_SUM_DATA
>   ) "自訂 SQL 查詢"
> 2.
> SELECT "自訂 SQL 查詢"."RETEST" AS "RETEST" FROM (   
>   SELECT DT  TEST_DATE,TEST_STATION_CODE,SUBSTRING(TEST_LINE,1,6) 
> TEST_FLOOR,TEST_LINE,CONFIG,REGION,LOCALLIZATION,
>   SUBSTRING(ATL FROM  1 for POSITION('*-' IN ATL from 1)-1 ) 
> VENDERATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 5)+1 FOR 3) 
> WEEKATL,SUBSTRING(ATL FROM POSITION('-' IN ATL FROM 6)+1 FOR 4) 
> DATEATL,SUBSTRING(ATL FROM POSITION('*-' IN ATL FROM 1)+7 FOR 4) ATL ,
>   INPUT,FAIL,RETEST FROM BI_DW.KPY_FACT_SUM_DATA
>   ) "自訂 SQL 查詢"
> 一方面我们在与tableau方联系,沟通为什么会发出这么奇怪的sql。
> 另一方面希望得到kylin的支持,看看有没有什么解决方法,
> 比如kylin.query.force-limit=1000 可以为 select * from table 加上 limit 1000,有没有办法限制 
> select a from table。
> 或者有没有其他的解决方法。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)