Dear sir, If you are using Real-time OLAP, you may check this issue : https://issues.apache.org/jira/browse/KYLIN-4396, and it is the patch link https://github.com/apache/kylin/pull/1134. It is a FD leak issue what I find early this year. In the cloud env, FD leak will be convert to connection leak issue, am I right? If you think it is a connection leak issue which maybe cause by other reason, please let us know your network stats information, maybe command output of "netstat -anp" ? Good luck to you!
-- Best wishes to you ! From :Xiaoxiang Yu At 2020-07-21 20:42:53, "Andras Nagy" <[email protected]> wrote: Dear All, We run into an issue where after an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause of the issue in both sides is this exception: Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?] In our setup, S3 is used for both intermediate data storage as well as persistence under HBase. Based on https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/ increasing the connection pool size (fs.s3.maxConnections property) to 10 000 is just delaying the issue thus the underlying issue is likely a connection leak. It also indicates a leak that restarting the kylin service solves the problem. We opened a ticket about the issue, it is https://issues.apache.org/jira/browse/KYLIN-4500. A full stack trace from the QueryService is attached to the ticket. Since this is seriously affecting our production service, any hint would be much appreciated. Is there any chance someone could look into this? Many thanks, Andras
