Dear sir,
  If you are using Real-time OLAP, you may check this issue : 
https://issues.apache.org/jira/browse/KYLIN-4396, and it is the patch link 
https://github.com/apache/kylin/pull/1134. It is a FD leak issue what I find 
early this year. In the cloud env, FD leak will be convert to connection leak 
issue, am I right?
  If you think it is a connection leak issue which maybe cause by other reason, 
please let us know your network stats information, maybe command output of 
"netstat -anp" ?
  Good luck to you!







--

Best wishes to you ! 
From :Xiaoxiang Yu




At 2020-07-21 20:42:53, "Andras Nagy" <[email protected]> wrote:

Dear All,


We run into an issue where after an extended uptime, both Kylin query server 
and jobs running on EMR stop working. The root cause of the issue in both sides 
is this exception:

Caused by: java.io.IOException: 
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to 
execute HTTP request: Timeout waiting for connection from pool
        at 
com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257)
 ~[emrfs-hadoop-assembly-2.37.0.jar:?]

In our setup, S3 is used for both intermediate data storage as well as 
persistence under HBase.
 
Based on 
https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/
 increasing the connection pool size (fs.s3.maxConnections property) to 10 000 
is just delaying the issue thus the underlying issue is likely a connection 
leak.
It also indicates a leak that restarting the kylin service solves the problem.

We opened a ticket about the issue, it is 
https://issues.apache.org/jira/browse/KYLIN-4500.
A full stack trace from the QueryService is attached to the ticket.

Since this is seriously affecting our production service, any hint would be 
much appreciated. Is there any chance someone could look into this?

Many thanks,
Andras

Reply via email to