Hi Aprit, I contributed the patch[1] long ago, but it hasn’t been merged yet. Could you please tell when to expect it in the main branch?
[1] https://issues.apache.org/jira/browse/HADOOP-15358 Cheers, Mike On Wed, 22 Aug 2018 at 01:06, Mikhail Pryakhin <m.prya...@gmail.com> wrote: > Hi Arpit, > Thank you, will contribute the fix shortly. > > On Wed, 22 Aug 2018 at 00:52, Arpit Agarwal <aagar...@hortonworks.com> > wrote: > >> Hi Mikhail, >> >> >> >> There’s two ways to contribute a fix: >> >> 1. Attach a patch file to the Jira and click “Submit Patch”. I’ve >> made you a contributor and assigned it to you. >> 2. Submit a GitHub pull request with the Jira key in the title. Not >> tried this in a while and not sure it still works. >> >> >> >> >> >> *From: *Mikhail Pryakhin <m.prya...@gmail.com> >> *Date: *Tuesday, August 21, 2018 at 2:40 PM >> *To: *<user@hadoop.apache.org> >> *Subject: *SFTPConnectionPool connections leakage >> >> >> >> Hi, >> >> I’ve come across a connection leakage while using SFTPFileSystem. Methods >> of SFTPFileSystem operate on poolable ChannelSftp instances, thus some >> methods of SFTPFileSystem are chained together resulting in establishing >> multiple connections to the SFTP server to accomplish one compound action, >> those methods are: >> >> >> >> *mkdirs method*[1]*. *The public mkdirs method acquires a new >> ChannelSftp from the pool and then recursively creates directories, >> checking for the directory existence beforehand by calling the method >> exists[2] which delegates to the getFileStatus(ChannelSftp channel, Path >> file) method [3] and so on until it ends up in returning the FilesStatus >> instance [4]. The resource leakage occurs in the method getWorkingDirectory >> which calls the getHomeDirectory method [5] which in turn establishes a new >> connection to the sftp server instead of using an already created >> connection. As the mkdirs method is recursive this results in creating a >> huge number of connections. >> >> >> >> *open method* [6]. This method returns an instance of FSDataInputStream >> which consumes SFTPInputStream instance which doesn't return an acquired >> ChannelSftp instance back to the pool but instead it closes it[7]. This >> leads to establishing another connection to an SFTP server when the next >> method is called on the FileSystem instance. >> >> >> >> I’ve issued a Jira ticket >> https://issues.apache.org/jira/browse/HADOOP-15358, and fixed the >> connection leakage issue. >> >> Could I create a pull request to merge the fix? >> >> >> >> [1] >> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658 >> >> [2] >> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321 >> >> [3] >> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202 >> >> [4] >> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290 >> >> [5] >> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640 >> >> [6] >> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504 >> >> [7] >> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123 >> >> >> >> >> >> Kind Regards, >> >> Mike Pryakhin >> >> >> >> -- > Regards, Mikhail Pryakhin. > -- Regards, Mikhail Pryakhin.