Hi Xiaoxiang, Thank you, this indeed seems to be related to what we have. > In the cloud env, FD leak will be convert to connection leak issue, am I right? Yes, that sounds plausible. We will check with netstat.
Thanks again, best regards, Andras On Tue, Jul 21, 2020 at 3:55 PM Xiaoxiang Yu <[email protected]> wrote: > Dear sir, > If you are using Real-time OLAP, you may check this issue : > https://issues.apache.org/jira/browse/KYLIN-4396, and it is the patch > link https://github.com/apache/kylin/pull/1134. It is a FD leak issue > what I find early this year. In the cloud env, FD leak will be convert to > connection leak issue, am I right? > If you think it is a connection leak issue which maybe cause by other > reason, please let us know your network stats information, maybe command > output of "netstat -anp" ? > Good luck to you! > > > > -- > *Best wishes to you ! * > *From :**Xiaoxiang Yu* > > > At 2020-07-21 20:42:53, "Andras Nagy" <[email protected]> > wrote: > > Dear All, > > We run into an issue where after an extended uptime, both Kylin query > server and jobs running on EMR stop working. The root cause of the issue in > both sides is this exception: > > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?] > > In our setup, S3 is used for both intermediate data storage as well as > persistence under HBase. > > Based on > https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/ > increasing the connection pool size (fs.s3.maxConnections property) to 10 > 000 is just delaying the issue thus the underlying issue is likely a > connection leak. > It also indicates a leak that restarting the kylin service solves the > problem. > > We opened a ticket about the issue, it is > https://issues.apache.org/jira/browse/KYLIN-4500. > A full stack trace from the QueryService is attached to the ticket. > > Since this is seriously affecting our production service, any hint would > be much appreciated. Is there any chance someone could look into this? > > Many thanks, > Andras > >
