Hi Aprit,

I contributed the patch[1] long ago, but it hasn’t been merged yet. Could
you please tell when to expect it in the main branch?


[1]
https://issues.apache.org/jira/browse/HADOOP-15358

Cheers,
 Mike

On Wed, 22 Aug 2018 at 01:06, Mikhail Pryakhin <m.prya...@gmail.com> wrote:

> Hi Arpit,
> Thank you, will contribute the fix shortly.
>
> On Wed, 22 Aug 2018 at 00:52, Arpit Agarwal <aagar...@hortonworks.com>
> wrote:
>
>> Hi Mikhail,
>>
>>
>>
>> There’s two ways to contribute a fix:
>>
>>    1. Attach a patch file to the Jira and click “Submit Patch”. I’ve
>>    made you a contributor and assigned it to you.
>>    2. Submit a GitHub pull request with the Jira key in the title. Not
>>    tried this in a while and not sure it still works.
>>
>>
>>
>>
>>
>> *From: *Mikhail Pryakhin <m.prya...@gmail.com>
>> *Date: *Tuesday, August 21, 2018 at 2:40 PM
>> *To: *<user@hadoop.apache.org>
>> *Subject: *SFTPConnectionPool connections leakage
>>
>>
>>
>> Hi,
>>
>> I’ve come across a connection leakage while using SFTPFileSystem. Methods
>> of SFTPFileSystem operate on poolable ChannelSftp instances, thus some
>> methods of SFTPFileSystem are chained together resulting in establishing
>> multiple connections to the SFTP server to accomplish one compound action,
>> those methods are:
>>
>>
>>
>> *mkdirs method*[1]*. *The public mkdirs method acquires a new
>> ChannelSftp from the pool and then recursively creates directories,
>> checking for the directory existence beforehand by calling the method
>> exists[2] which delegates to the getFileStatus(ChannelSftp channel, Path
>> file) method [3] and so on until it ends up in returning the FilesStatus
>> instance [4]. The resource leakage occurs in the method getWorkingDirectory
>> which calls the getHomeDirectory method [5] which in turn establishes a new
>> connection to the sftp server instead of using an already created
>> connection. As the mkdirs method is recursive this results in creating a
>> huge number of connections.
>>
>>
>>
>> *open method* [6]. This method returns an instance of FSDataInputStream
>> which consumes SFTPInputStream instance which doesn't return an acquired
>> ChannelSftp instance back to the pool but instead it closes it[7]. This
>> leads to establishing another connection to an SFTP server when the next
>> method is called on the FileSystem instance.
>>
>>
>>
>> I’ve issued a Jira ticket
>> https://issues.apache.org/jira/browse/HADOOP-15358, and fixed the
>> connection leakage issue.
>>
>> Could I create a pull request to merge the fix?
>>
>>
>>
>> [1]
>> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658
>>
>> [2]
>> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321
>>
>> [3]
>> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202
>>
>> [4]
>> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290
>>
>> [5]
>> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640
>>
>> [6]
>> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504
>>
>> [7]
>> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123
>>
>>
>>
>>
>>
>> Kind Regards,
>>
>> Mike Pryakhin
>>
>>
>>
>> --
> Regards, Mikhail Pryakhin.
>
-- 
Regards, Mikhail Pryakhin.

Reply via email to