Dear All, We run into an issue where after an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause of the issue in both sides is this exception:
Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?] In our setup, S3 is used for both intermediate data storage as well as persistence under HBase. Based on https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/ increasing the connection pool size (fs.s3.maxConnections property) to 10 000 is just delaying the issue thus the underlying issue is likely a connection leak. It also indicates a leak that restarting the kylin service solves the problem. We opened a ticket about the issue, it is https://issues.apache.org/jira/browse/KYLIN-4500. A full stack trace from the QueryService is attached to the ticket. Since this is seriously affecting our production service, any hint would be much appreciated. Is there any chance someone could look into this? Many thanks, Andras