Hi Xiaoxiang,

Thank you, this indeed seems to be related to what we have.
> In the cloud env, FD leak will be convert to connection leak issue, am I
right?
Yes, that sounds plausible. We will check with netstat.

Thanks again, best regards,
Andras


On Tue, Jul 21, 2020 at 3:55 PM Xiaoxiang Yu <[email protected]> wrote:

> Dear sir,
>   If you are using Real-time OLAP, you may check this issue :
> https://issues.apache.org/jira/browse/KYLIN-4396, and it is the patch
> link https://github.com/apache/kylin/pull/1134. It is a FD leak issue
> what I find early this year. In the cloud env, FD leak will be convert to
> connection leak issue, am I right?
>   If you think it is a connection leak issue which maybe cause by other
> reason, please let us know your network stats information, maybe command
> output of "netstat -anp" ?
>   Good luck to you!
>
>
>
> --
> *Best wishes to you ! *
> *From :**Xiaoxiang Yu*
>
>
> At 2020-07-21 20:42:53, "Andras Nagy" <[email protected]>
> wrote:
>
> Dear All,
>
> We run into an issue where after an extended uptime, both Kylin query
> server and jobs running on EMR stop working. The root cause of the issue in
> both sides is this exception:
>
> Caused by: java.io.IOException:
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable
> to execute HTTP request: Timeout waiting for connection from pool
>         at
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257)
> ~[emrfs-hadoop-assembly-2.37.0.jar:?]
>
> In our setup, S3 is used for both intermediate data storage as well as
> persistence under HBase.
>
> Based on
> https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/
> increasing the connection pool size (fs.s3.maxConnections property) to 10
> 000 is just delaying the issue thus the underlying issue is likely a
> connection leak.
> It also indicates a leak that restarting the kylin service solves the
> problem.
>
> We opened a ticket about the issue, it is
> https://issues.apache.org/jira/browse/KYLIN-4500.
> A full stack trace from the QueryService is attached to the ticket.
>
> Since this is seriously affecting our production service, any hint would
> be much appreciated. Is there any chance someone could look into this?
>
> Many thanks,
> Andras
>
>

Reply via email to