[ https://issues.apache.org/jira/browse/KYLIN-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shao Feng Shi resolved KYLIN-4432. ---------------------------------- Resolution: Fixed > duplicated queries with sytax error take unexpect long time when lazy query > enabled > ----------------------------------------------------------------------------------- > > Key: KYLIN-4432 > URL: https://issues.apache.org/jira/browse/KYLIN-4432 > Project: Kylin > Issue Type: Bug > Reporter: Congling Xia > Assignee: Congling Xia > Priority: Major > Fix For: v3.1.0 > > > Hi! Our Kylin server becomes unresponsive recently. All query threads were > being TIMED_WAITING so that no more thread in the thread pool of Tomcat could > response new requests. The server logs are full of "Duplicate SQL request"s, > and many requests retrying for so long time: > {code:java} > $ cat kylin.log.25 | grep 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991 | head > -3 > 2020-03-23 12:15:32,279 INFO [Query > 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : > Duplicated SQL request is running, waiting... > 2020-03-23 12:15:32,379 INFO [Query > 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : > Duplicated SQL request is running, waiting... > 2020-03-23 12:15:32,479 INFO [Query > 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : > Duplicated SQL request is running, waiting... > $ cat kylin.log.25 | grep 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991 | tail > -3 > 2020-03-23 12:26:56,920 INFO [Query > 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : > Duplicated SQL request is running, waiting... > 2020-03-23 12:26:57,020 INFO [Query > 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : > Duplicated SQL request is running, waiting... > 2020-03-23 12:26:57,121 INFO [Query > 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : > Duplicated SQL request is running, waiting... > {code} > This can be re-produced easily: > * enable lazy query (KYLIN-2897) by setting environment variable > kylin.query.cache-enabled=true and kylin.query.lazy-query-enabled=true > * send the same query with syntax error many times in several minutes > The first query will quickly response with error message, and the following > queries will executing for long time and will repeatly output "Duplicated SQL > request is running, waiting...". > The code in org.apache.kylin.rest.service.QueryService#searchQueryInCache > indicates duplicated query will wait at most one minute with default setting > of LazyQueryWaitingTimeoutMilliSeconds but in fact it waits quite longer than > that. > {code:java} > while (response.isRunning()) { > // Wait at most one minute > if (System.currentTimeMillis() - response.getLazyQueryStartTime() >= > getConfig() > .getLazyQueryWaitingTimeoutMilliSeconds()) { > cache.evict(sqlRequest.getCacheKey()); > return null; > } > logger.info("Duplicated SQL request is running, waiting..."); > try { > Thread.sleep(100L); > } catch (InterruptedException e) { > } > wrapper = cache.get(sqlRequest.getCacheKey()); > if (wrapper == null) { > return null; > } > response = (SQLResponse) wrapper.get(); > if (response == null) { > return null; > } > } > {code} > After some work-around, I find that the dummy response status remains > unchanged after execution with exception. So the 2nd query will wait > LazyQueryWaitingTimeoutMilliSeconds, evict the dummy response created by the > 1st query from cache, and put a new dummy response into the cache. And the > 3rd query may use the new dummy response's start time to compute timeout, and > will wait for another LazyQueryWaitingTimeoutMilliSeconds. > Suppose we send _n_ identical bad queries to the server quickly. The last > query may wait for nearly _(n-1)_ times of > LazyQueryWaitingTimeoutMilliSeconds. That's why the Web container's thread > pool was exhausted – user program uses bad queries to check availability of > Kylin service and retries many times in a short time. -- This message was sent by Atlassian Jira (v8.3.4#803005)