[jira] [Updated] (HADOOP-17528) Not closing an SFTP File System instance prevents JVM from exiting.

2021-02-23 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HADOOP-17528:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Not closing an SFTP File System instance prevents JVM from exiting. 
> 
>
> Key: HADOOP-17528
> URL: https://issues.apache.org/jira/browse/HADOOP-17528
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Mikhail Pryakhin
>Assignee: Mikhail Pryakhin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> SFTP file system leverages a connection pool which is not closed when a file 
> system instance gets closed preventing a JVM from exiting as every SFTP 
> connection runs in a separate non-daemon thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HADOOP-17528) Not closing an SFTP File System instance prevents JVM from exiting.

2021-02-14 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HADOOP-17528:
--
Comment: was deleted

(was: I've created a [PR|https://github.com/apache/hadoop/pull/2701], could 
someone review it please?)

> Not closing an SFTP File System instance prevents JVM from exiting. 
> 
>
> Key: HADOOP-17528
> URL: https://issues.apache.org/jira/browse/HADOOP-17528
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Mikhail Pryakhin
>Assignee: Mikhail Pryakhin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SFTP file system leverages a connection pool which is not closed when a file 
> system instance gets closed preventing a JVM from exiting as every SFTP 
> connection runs in a separate non-daemon thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17528) Not closing an SFTP File System instance prevents JVM from exiting.

2021-02-14 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284564#comment-17284564
 ] 

Mikhail Pryakhin commented on HADOOP-17528:
---

I've created a [PR|https://github.com/apache/hadoop/pull/2701], could someone 
review it please?

> Not closing an SFTP File System instance prevents JVM from exiting. 
> 
>
> Key: HADOOP-17528
> URL: https://issues.apache.org/jira/browse/HADOOP-17528
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Mikhail Pryakhin
>Assignee: Mikhail Pryakhin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SFTP file system leverages a connection pool which is not closed when a file 
> system instance gets closed preventing a JVM from exiting as every SFTP 
> connection runs in a separate non-daemon thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17528) Not closing an SFTP File System instance prevents JVM from exiting.

2021-02-14 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HADOOP-17528:
--
Affects Version/s: 3.2.0
   Status: Patch Available  (was: Open)

https://github.com/apache/hadoop/pull/2701.patch

> Not closing an SFTP File System instance prevents JVM from exiting. 
> 
>
> Key: HADOOP-17528
> URL: https://issues.apache.org/jira/browse/HADOOP-17528
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Mikhail Pryakhin
>Assignee: Mikhail Pryakhin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SFTP file system leverages a connection pool which is not closed when a file 
> system instance gets closed preventing a JVM from exiting as every SFTP 
> connection runs in a separate non-daemon thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17528) Not closing an SFTP File System instance prevents JVM from exiting.

2021-02-13 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HADOOP-17528:
--
Description: SFTP file system leverages a connection pool which is not 
closed when a file system instance gets closed preventing a JVM from exiting as 
every SFTP connection runs in a separate non-daemon thread.  (was: Not closing 
an SFTP File System instance prevents JVM from exiting. 
SFTP file system leverages a connection pool which is not closed when a file 
system instance gets closed preventing a JVM from exiting as every SFTP 
connection runs in a separate non-daemon thread.)

> Not closing an SFTP File System instance prevents JVM from exiting. 
> 
>
> Key: HADOOP-17528
> URL: https://issues.apache.org/jira/browse/HADOOP-17528
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Mikhail Pryakhin
>Assignee: Mikhail Pryakhin
>Priority: Major
>
> SFTP file system leverages a connection pool which is not closed when a file 
> system instance gets closed preventing a JVM from exiting as every SFTP 
> connection runs in a separate non-daemon thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17528) Not closing an SFTP File System instance prevents JVM from exiting.

2021-02-13 Thread Mikhail Pryakhin (Jira)
Mikhail Pryakhin created HADOOP-17528:
-

 Summary: Not closing an SFTP File System instance prevents JVM 
from exiting. 
 Key: HADOOP-17528
 URL: https://issues.apache.org/jira/browse/HADOOP-17528
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Mikhail Pryakhin
Assignee: Mikhail Pryakhin


Not closing an SFTP File System instance prevents JVM from exiting. 
SFTP file system leverages a connection pool which is not closed when a file 
system instance gets closed preventing a JVM from exiting as every SFTP 
connection runs in a separate non-daemon thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14566) Add seek support for SFTP FileSystem

2020-05-28 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118548#comment-17118548
 ] 

Mikhail Pryakhin commented on HADOOP-14566:
---

[~ste...@apache.org] thanks a lot for your review. All the issues you pointed 
out have been fixed. Could we let it to go in?

> Add seek support for SFTP FileSystem
> 
>
> Key: HADOOP-14566
> URL: https://issues.apache.org/jira/browse/HADOOP-14566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Azhagu Selvan SP
>Assignee: Mikhail Pryakhin
>Priority: Minor
> Attachments: HADOOP-14566.001.patch, HADOOP-14566.patch
>
>
> This patch adds seek() method implementation for SFTP FileSystem and a unit 
> test for the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14566) Add seek support for SFTP FileSystem

2020-05-28 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118548#comment-17118548
 ] 

Mikhail Pryakhin edited comment on HADOOP-14566 at 5/28/20, 11:06 AM:
--

[~ste...@apache.org] thanks a lot for your review. All the issues you pointed 
out have been fixed. Could we let it go in?


was (Author: m.pryahin):
[~ste...@apache.org] thanks a lot for your review. All the issues you pointed 
out have been fixed. Could we let it to go in?

> Add seek support for SFTP FileSystem
> 
>
> Key: HADOOP-14566
> URL: https://issues.apache.org/jira/browse/HADOOP-14566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Azhagu Selvan SP
>Assignee: Mikhail Pryakhin
>Priority: Minor
> Attachments: HADOOP-14566.001.patch, HADOOP-14566.patch
>
>
> This patch adds seek() method implementation for SFTP FileSystem and a unit 
> test for the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14566) Add seek support for SFTP FileSystem

2020-05-19 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111308#comment-17111308
 ] 

Mikhail Pryakhin commented on HADOOP-14566:
---

Hey [~ste...@apache.org], I'm just writing to ask whether it could be possible 
to review the changes introduced by the patch as they're quite important for 
me. Thanks :)

> Add seek support for SFTP FileSystem
> 
>
> Key: HADOOP-14566
> URL: https://issues.apache.org/jira/browse/HADOOP-14566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Azhagu Selvan SP
>Assignee: Mikhail Pryakhin
>Priority: Minor
> Attachments: HADOOP-14566.001.patch, HADOOP-14566.patch
>
>
> This patch adds seek() method implementation for SFTP FileSystem and a unit 
> test for the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17036) TestFTPFileSystem failing as ftp server dir already exists

2020-05-14 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107554#comment-17107554
 ] 

Mikhail Pryakhin commented on HADOOP-17036:
---

Why did it fail to integrate changes? Can I help somehow? 

> TestFTPFileSystem failing as ftp server dir already exists
> --
>
> Key: HADOOP-17036
> URL: https://issues.apache.org/jira/browse/HADOOP-17036
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mikhail Pryakhin
>Priority: Minor
> Fix For: 3.4.0
>
>
> TestFTPFileSystem failing as the test dir exists.
> need to delete in setup/teardown of each test case



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17036) TestFTPFileSystem failing as ftp server dir already exists

2020-05-14 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107263#comment-17107263
 ] 

Mikhail Pryakhin commented on HADOOP-17036:
---

Hey [~ste...@apache.org], the test runner reports success. 
Could you please check it out? Thank you!

> TestFTPFileSystem failing as ftp server dir already exists
> --
>
> Key: HADOOP-17036
> URL: https://issues.apache.org/jira/browse/HADOOP-17036
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mikhail Pryakhin
>Priority: Minor
>
> TestFTPFileSystem failing as the test dir exists.
> need to delete in setup/teardown of each test case



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-17036) TestFTPFileSystem failing as ftp server dir already exists

2020-05-12 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105578#comment-17105578
 ] 

Mikhail Pryakhin edited comment on HADOOP-17036 at 5/12/20, 5:12 PM:
-

here it is:

[https://github.com/apache/hadoop/pull/2009]

And now the test runner fails the build by the virtue of the following failed 
test. I'm currently investigating the reason as it passes locally

{code:java}
java.lang.AssertionError: Expected exactly one metric for name 
RpcServerExceptionNumOps expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at 
org.apache.hadoop.test.MetricsAsserts.checkCaptured(MetricsAsserts.java:278)
at 
org.apache.hadoop.test.MetricsAsserts.getLongCounter(MetricsAsserts.java:237)
at 
org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:231)
at org.apache.hadoop.ipc.TestRPC.testCallsInternal(TestRPC.java:510)
at org.apache.hadoop.ipc.TestRPC.testCalls(TestRPC.java:428)
{code}



was (Author: m.pryahin):
here it is:

[https://github.com/apache/hadoop/pull/2009]

> TestFTPFileSystem failing as ftp server dir already exists
> --
>
> Key: HADOOP-17036
> URL: https://issues.apache.org/jira/browse/HADOOP-17036
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mikhail Pryakhin
>Priority: Minor
>
> TestFTPFileSystem failing as the test dir exists.
> need to delete in setup/teardown of each test case



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14566) Add seek support for SFTP FileSystem

2020-05-12 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105579#comment-17105579
 ] 

Mikhail Pryakhin commented on HADOOP-14566:
---

a pr is available here

[https://github.com/apache/hadoop/pull/1999]

> Add seek support for SFTP FileSystem
> 
>
> Key: HADOOP-14566
> URL: https://issues.apache.org/jira/browse/HADOOP-14566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Azhagu Selvan SP
>Assignee: Mikhail Pryakhin
>Priority: Minor
> Attachments: HADOOP-14566.001.patch, HADOOP-14566.patch
>
>
> This patch adds seek() method implementation for SFTP FileSystem and a unit 
> test for the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17036) TestFTPFileSystem failing as ftp server dir already exists

2020-05-12 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105578#comment-17105578
 ] 

Mikhail Pryakhin commented on HADOOP-17036:
---

here it is:

[https://github.com/apache/hadoop/pull/2009]

> TestFTPFileSystem failing as ftp server dir already exists
> --
>
> Key: HADOOP-17036
> URL: https://issues.apache.org/jira/browse/HADOOP-17036
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mikhail Pryakhin
>Priority: Minor
>
> TestFTPFileSystem failing as the test dir exists.
> need to delete in setup/teardown of each test case



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17036) TestFTPFileSystem failing as ftp server dir already exists

2020-05-11 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HADOOP-17036:
--
Status: Patch Available  (was: Open)

patch available

[https://github.com/apache/hadoop/pull/2009.patch]

> TestFTPFileSystem failing as ftp server dir already exists
> --
>
> Key: HADOOP-17036
> URL: https://issues.apache.org/jira/browse/HADOOP-17036
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mikhail Pryakhin
>Priority: Minor
>
> TestFTPFileSystem failing as the test dir exists.
> need to delete in setup/teardown of each test case



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-17036) TestFTPFileSystem failing as ftp server dir already exists

2020-05-11 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin reassigned HADOOP-17036:
-

Assignee: Mikhail Pryakhin

> TestFTPFileSystem failing as ftp server dir already exists
> --
>
> Key: HADOOP-17036
> URL: https://issues.apache.org/jira/browse/HADOOP-17036
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mikhail Pryakhin
>Priority: Minor
>
> TestFTPFileSystem failing as the test dir exists.
> need to delete in setup/teardown of each test case



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14566) Add seek support for SFTP FileSystem

2020-05-07 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HADOOP-14566:
--
Status: Patch Available  (was: In Progress)

> Add seek support for SFTP FileSystem
> 
>
> Key: HADOOP-14566
> URL: https://issues.apache.org/jira/browse/HADOOP-14566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Azhagu Selvan SP
>Assignee: Mikhail Pryakhin
>Priority: Minor
> Attachments: HADOOP-14566.001.patch, HADOOP-14566.patch
>
>
> This patch adds seek() method implementation for SFTP FileSystem and a unit 
> test for the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14566) Add seek support for SFTP FileSystem

2020-05-07 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101713#comment-17101713
 ] 

Mikhail Pryakhin commented on HADOOP-14566:
---

the patch is available

[https://patch-diff.githubusercontent.com/raw/apache/hadoop/pull/1999.patch]

> Add seek support for SFTP FileSystem
> 
>
> Key: HADOOP-14566
> URL: https://issues.apache.org/jira/browse/HADOOP-14566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Azhagu Selvan SP
>Assignee: Mikhail Pryakhin
>Priority: Minor
> Attachments: HADOOP-14566.001.patch, HADOOP-14566.patch
>
>
> This patch adds seek() method implementation for SFTP FileSystem and a unit 
> test for the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14566) Add seek support for SFTP FileSystem

2020-05-06 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HADOOP-14566:
--
Status: In Progress  (was: Patch Available)

> Add seek support for SFTP FileSystem
> 
>
> Key: HADOOP-14566
> URL: https://issues.apache.org/jira/browse/HADOOP-14566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Azhagu Selvan SP
>Assignee: Mikhail Pryakhin
>Priority: Minor
> Attachments: HADOOP-14566.001.patch, HADOOP-14566.patch
>
>
> This patch adds seek() method implementation for SFTP FileSystem and a unit 
> test for the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-14566) Add seek support for SFTP FileSystem

2020-05-06 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin reassigned HADOOP-14566:
-

Assignee: Mikhail Pryakhin

> Add seek support for SFTP FileSystem
> 
>
> Key: HADOOP-14566
> URL: https://issues.apache.org/jira/browse/HADOOP-14566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Azhagu Selvan SP
>Assignee: Mikhail Pryakhin
>Priority: Minor
> Attachments: HADOOP-14566.001.patch, HADOOP-14566.patch
>
>
> This patch adds seek() method implementation for SFTP FileSystem and a unit 
> test for the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14566) Add seek support for SFTP FileSystem

2020-05-06 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100912#comment-17100912
 ] 

Mikhail Pryakhin commented on HADOOP-14566:
---

I've managed to implement both backward and forward lazy seeks as well as 
`AbstractContractSeekTest` for SFTP file system.

> Add seek support for SFTP FileSystem
> 
>
> Key: HADOOP-14566
> URL: https://issues.apache.org/jira/browse/HADOOP-14566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Azhagu Selvan SP
>Priority: Minor
> Attachments: HADOOP-14566.001.patch, HADOOP-14566.patch
>
>
> This patch adds seek() method implementation for SFTP FileSystem and a unit 
> test for the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-9713) FSDataInputStream.readFully doesn't work on filesystems without seek -even when the offset==getPos

2020-05-03 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098266#comment-17098266
 ] 

Mikhail Pryakhin edited comment on HADOOP-9713 at 5/3/20, 3:40 PM:
---

Another option is to defer a seek call until the next:
{code:java}
FsDataInputstream#read(long position, byte[] buffer, int offset, int 
length){code}
invocation, making it lazy. Normally the subsequent reads will proceed reading 
from the position where the previous read finished, meaning we can avoid making 
seek operations in this case. We will only need to seek when the current 
FsDataInputstream#getPos() != requested position. In standard read scenario, 
this will drastically reduce the number of seeks.


was (Author: m.pryahin):
Another option is to defer a seek call until the next
{code:java}
FsDataInputstream#read(long position, byte[] buffer, int offset, int 
length){code}
invocation, making it lazy. Normally the subsequent reads will proceed reading 
from the position where the previous read finished, meaning we can avoid making 
seek operations in this case. We will only need to seek when the current 
FsDataInputstream#getPos() != requested position. In standard read scenario, 
this will drastically reduce the number of seeks.

> FSDataInputStream.readFully doesn't work on filesystems without seek -even 
> when the offset==getPos
> --
>
> Key: HADOOP-9713
> URL: https://issues.apache.org/jira/browse/HADOOP-9713
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.1.0-beta, 1.3.0, 3.0.0-alpha1
>Reporter: Steve Loughran
>Assignee: Mikhail Pryakhin
>Priority: Minor
>
> {{FSDataInputStream.readFully(offset,data)}} doesn't work even if the 
> offset==the current location -because it always seeks to the offset and seeks 
> back. No seek => Exception.
> We could optimise {{FSDataInputStream.readFully(offset,data)}} to eliminate 
> the seeks on these operations -which would have tangible benefits for those 
> filesystems where seek is expensive (remote blobstores). It would also let 
> you use readFully against filesystems without seeks, provided you are only 
> reading from the current location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-9713) FSDataInputStream.readFully doesn't work on filesystems without seek -even when the offset==getPos

2020-05-03 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098266#comment-17098266
 ] 

Mikhail Pryakhin edited comment on HADOOP-9713 at 5/3/20, 3:39 PM:
---

Another option is to defer a seek call until the next
{code:java}
FsDataInputstream#read(long position, byte[] buffer, int offset, int 
length){code}
invocation, making it lazy. Normally the subsequent reads will proceed reading 
from the position where the previous read finished, meaning we can avoid making 
seek operations in this case. We will only need to seek when the current 
FsDataInputstream#getPos() != requested position. In standard read scenario, 
this will drastically reduce the number of seeks.


was (Author: m.pryahin):
Another option is to defer a seek call until the next `FsDataInputstream.
read(long position, byte[] buffer, int offset, int length)` invocation, making 
it lazy. Normally the subsequent reads will proceed reading from the position 
where the previous read finished, meaning we can avoid making seek operations 
in this case. We will only need to seek when the current  
`FsDataInputstream.getPos() != requested position`. In standard read scenario, 
this will drastically reduce the number of seeks.

> FSDataInputStream.readFully doesn't work on filesystems without seek -even 
> when the offset==getPos
> --
>
> Key: HADOOP-9713
> URL: https://issues.apache.org/jira/browse/HADOOP-9713
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.1.0-beta, 1.3.0, 3.0.0-alpha1
>Reporter: Steve Loughran
>Assignee: Mikhail Pryakhin
>Priority: Minor
>
> {{FSDataInputStream.readFully(offset,data)}} doesn't work even if the 
> offset==the current location -because it always seeks to the offset and seeks 
> back. No seek => Exception.
> We could optimise {{FSDataInputStream.readFully(offset,data)}} to eliminate 
> the seeks on these operations -which would have tangible benefits for those 
> filesystems where seek is expensive (remote blobstores). It would also let 
> you use readFully against filesystems without seeks, provided you are only 
> reading from the current location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-9713) FSDataInputStream.readFully doesn't work on filesystems without seek -even when the offset==getPos

2020-05-03 Thread Mikhail Pryakhin (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin reassigned HADOOP-9713:


Assignee: Mikhail Pryakhin

> FSDataInputStream.readFully doesn't work on filesystems without seek -even 
> when the offset==getPos
> --
>
> Key: HADOOP-9713
> URL: https://issues.apache.org/jira/browse/HADOOP-9713
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.1.0-beta, 1.3.0, 3.0.0-alpha1
>Reporter: Steve Loughran
>Assignee: Mikhail Pryakhin
>Priority: Minor
>
> {{FSDataInputStream.readFully(offset,data)}} doesn't work even if the 
> offset==the current location -because it always seeks to the offset and seeks 
> back. No seek => Exception.
> We could optimise {{FSDataInputStream.readFully(offset,data)}} to eliminate 
> the seeks on these operations -which would have tangible benefits for those 
> filesystems where seek is expensive (remote blobstores). It would also let 
> you use readFully against filesystems without seeks, provided you are only 
> reading from the current location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-9713) FSDataInputStream.readFully doesn't work on filesystems without seek -even when the offset==getPos

2020-05-03 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098266#comment-17098266
 ] 

Mikhail Pryakhin commented on HADOOP-9713:
--

Another option is to defer a seek call until the next `FsDataInputstream.
read(long position, byte[] buffer, int offset, int length)` invocation, making 
it lazy. Normally the subsequent reads will proceed reading from the position 
where the previous read finished, meaning we can avoid making seek operations 
in this case. We will only need to seek when the current  
`FsDataInputstream.getPos() != requested position`. In standard read scenario, 
this will drastically reduce the number of seeks.

> FSDataInputStream.readFully doesn't work on filesystems without seek -even 
> when the offset==getPos
> --
>
> Key: HADOOP-9713
> URL: https://issues.apache.org/jira/browse/HADOOP-9713
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.1.0-beta, 1.3.0, 3.0.0-alpha1
>Reporter: Steve Loughran
>Priority: Minor
>
> {{FSDataInputStream.readFully(offset,data)}} doesn't work even if the 
> offset==the current location -because it always seeks to the offset and seeks 
> back. No seek => Exception.
> We could optimise {{FSDataInputStream.readFully(offset,data)}} to eliminate 
> the seeks on these operations -which would have tangible benefits for those 
> filesystems where seek is expensive (remote blobstores). It would also let 
> you use readFully against filesystems without seeks, provided you are only 
> reading from the current location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14566) Add seek support for SFTP FileSystem

2020-05-02 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098041#comment-17098041
 ] 

Mikhail Pryakhin commented on HADOOP-14566:
---

Hey [~ste...@apache.org] could I take over this issue?

> Add seek support for SFTP FileSystem
> 
>
> Key: HADOOP-14566
> URL: https://issues.apache.org/jira/browse/HADOOP-14566
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Azhagu Selvan SP
>Priority: Minor
> Attachments: HADOOP-14566.001.patch, HADOOP-14566.patch
>
>
> This patch adds seek() method implementation for SFTP FileSystem and a unit 
> test for the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-9713) FSDataInputStream.readFully doesn't work on filesystems without seek -even when the offset==getPos

2020-05-02 Thread Mikhail Pryakhin (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098039#comment-17098039
 ] 

Mikhail Pryakhin commented on HADOOP-9713:
--

[~ste...@apache.org]

That's a great Idea, but [the method JavaDoc 
claims|https://github.com/apache/hadoop/blob/ba66f3b454a5f6ea84f2cf7ac0082c555e2954a7/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PositionedReadable.java#L59]
 that file offset is not changed after the method invocation. This means we 
have to seek back to the initial position to leave the file offset unchanged, 
don't we?

 

> FSDataInputStream.readFully doesn't work on filesystems without seek -even 
> when the offset==getPos
> --
>
> Key: HADOOP-9713
> URL: https://issues.apache.org/jira/browse/HADOOP-9713
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.1.0-beta, 1.3.0, 3.0.0-alpha1
>Reporter: Steve Loughran
>Priority: Minor
>
> {{FSDataInputStream.readFully(offset,data)}} doesn't work even if the 
> offset==the current location -because it always seeks to the offset and seeks 
> back. No seek => Exception.
> We could optimise {{FSDataInputStream.readFully(offset,data)}} to eliminate 
> the seeks on these operations -which would have tangible benefits for those 
> filesystems where seek is expensive (remote blobstores). It would also let 
> you use readFully against filesystems without seeks, provided you are only 
> reading from the current location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15358) SFTPConnectionPool connections leakage

2018-08-22 Thread Mikhail Pryakhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HADOOP-15358:
--
Release Note: Fixed SFTPConnectionPool connections leakage
  Attachment: HADOOP-15358.001.patch
  Status: Patch Available  (was: Open)

Fixed SFTPConnectionPool connections leakage

> SFTPConnectionPool connections leakage
> --
>
> Key: HADOOP-15358
> URL: https://issues.apache.org/jira/browse/HADOOP-15358
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Mikhail Pryakhin
>Assignee: Mikhail Pryakhin
>Priority: Critical
> Attachments: HADOOP-15358.001.patch
>
>
> Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus 
> some methods of SFTPFileSystem are chained together resulting in establishing 
> multiple connections to the SFTP server to accomplish one compound action, 
> those methods are listed below:
>  # mkdirs method
> the public mkdirs method acquires a new ChannelSftp from the pool [1]
> and then recursively creates directories, checking for the directory 
> existence beforehand by calling the method exists[2] which delegates to the 
> getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it 
> ends up in returning the FilesStatus instance [4]. The resource leakage 
> occurs in the method getWorkingDirectory which calls the getHomeDirectory 
> method [5] which in turn establishes a new connection to the sftp server 
> instead of using an already created connection. As the mkdirs method is 
> recursive this results in creating a huge number of connections.
>  # open method [6]. This method returns an instance of FSDataInputStream 
> which consumes SFTPInputStream instance which doesn't return an acquired 
> ChannelSftp instance back to the pool but instead it closes it[7]. This leads 
> to establishing another connection to an SFTP server when the next method is 
> called on the FileSystem instance.
> [1] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658
> [2] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321
> [3] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202
> [4] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290
> [5] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640
> [6] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504
> [7] 
> https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15358) SFTPConnectionPool connections leakage

2018-04-03 Thread Mikhail Pryakhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HADOOP-15358:
--
Description: 
Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some 
methods of SFTPFileSystem are chained together resulting in establishing 
multiple connections to the SFTP server to accomplish one compound action, 
those methods are listed below:
 # mkdirs method
the public mkdirs method acquires a new ChannelSftp from the pool [1]
and then recursively creates directories, checking for the directory existence 
beforehand by calling the method exists[2] which delegates to the 
getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it 
ends up in returning the FilesStatus instance [4]. The resource leakage occurs 
in the method getWorkingDirectory which calls the getHomeDirectory method [5] 
which in turn establishes a new connection to the sftp server instead of using 
an already created connection. As the mkdirs method is recursive this results 
in creating a huge number of connections.
 # open method [6]. This method returns an instance of FSDataInputStream which 
consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp 
instance back to the pool but instead it closes it[7]. This leads to 
establishing another connection to an SFTP server when the next method is 
called on the FileSystem instance.


[1] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658

[2] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321

[3] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202

[4] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290

[5] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640

[6] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504

[7] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123

  was:
Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some 
methods of SFTPFileSystem are chained together resulting in establishing 
multiple connections to the SFTP server to accomplish one compound action, 
those methods are listed below:
 # mkdirs method
the public mkdirs method acquires a new ChannelSftp [from the 
pool|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658]]
and then recursively creates directories, checking for the directory existence 
beforehand by calling the method 
[exists|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321]
] which delegates to the getFileStatus(ChannelSftp channel, Path file) 
[method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202]]
 and so on until it ends up in returning the [FilesStatus 
instance|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290]].
 The resource leakage occurs in the method getWorkingDirectory which calls the 
getHomeDirectory 
[method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640]]
 which in turn establishes a new connection to the sftp server instead of using 
an already created connection. As the mkdirs method is recursive this results 
in creating a huge number of connections.
 # open 
[method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504]].
 This method returns an instance of FSDataInputStream which consumes 
SFTPInputStream instance which doesn't return an acquired ChannelSftp 

[jira] [Updated] (HADOOP-15358) SFTPConnectionPool connections leakage

2018-04-03 Thread Mikhail Pryakhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Pryakhin updated HADOOP-15358:
--
Description: 
Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some 
methods of SFTPFileSystem are chained together resulting in establishing 
multiple connections to the SFTP server to accomplish one compound action, 
those methods are listed below:
 # mkdirs method
the public mkdirs method acquires a new ChannelSftp [from the 
pool|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658]]
and then recursively creates directories, checking for the directory existence 
beforehand by calling the method 
[exists|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321]
] which delegates to the getFileStatus(ChannelSftp channel, Path file) 
[method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202]]
 and so on until it ends up in returning the [FilesStatus 
instance|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290]].
 The resource leakage occurs in the method getWorkingDirectory which calls the 
getHomeDirectory 
[method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640]]
 which in turn establishes a new connection to the sftp server instead of using 
an already created connection. As the mkdirs method is recursive this results 
in creating a huge number of connections.
 # open 
[method|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504]].
 This method returns an instance of FSDataInputStream which consumes 
SFTPInputStream instance which doesn't return an acquired ChannelSftp instance 
back to the pool but instead it 
[closes|[https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123]]
 it. This leads to establishing another connection to an SFTP server when the 
next method is called on the FileSystem instance.

 

  was:
Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some 
methods of SFTPFileSystem are chained together resulting in establishing 
multiple connections to the SFTP server to accomplish one compound action, 
those methods are listed below:
 # mkdirs method
the public mkdirs method acquires a new ChannelSftp from the pool [1]
and then recursively creates directories, checking for the directory existence 
beforehand by calling the method exists[2] which delegates to the 
getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it 
ends up in returning the FilesStatus instance [4]. The resource leakage occurs 
in the method getWorkingDirectory which calls the getHomeDirectory method [5] 
which in turn establishes a new connection to the sftp server instead of using 
an already created connection. As the mkdirs method is recursive this results 
in creating a huge number of connections.
 # open method [6]   This method returns an instance of FSDataInputStream which 
consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp 
instance back to the pool but instead it closes it[7]. This leads to 
establishing another connection to an SFTP server when the next method is 
called on the FileSystem instance.


[1] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658

[2] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321

[3] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202

[4] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290

[5] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640

[6] 

[jira] [Created] (HADOOP-15358) SFTPConnectionPool connections leakage

2018-04-03 Thread Mikhail Pryakhin (JIRA)
Mikhail Pryakhin created HADOOP-15358:
-

 Summary: SFTPConnectionPool connections leakage
 Key: HADOOP-15358
 URL: https://issues.apache.org/jira/browse/HADOOP-15358
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Mikhail Pryakhin


Methods of SFTPFileSystem operate on poolable ChannelSftp instances, thus some 
methods of SFTPFileSystem are chained together resulting in establishing 
multiple connections to the SFTP server to accomplish one compound action, 
those methods are listed below:
 # mkdirs method
the public mkdirs method acquires a new ChannelSftp from the pool [1]
and then recursively creates directories, checking for the directory existence 
beforehand by calling the method exists[2] which delegates to the 
getFileStatus(ChannelSftp channel, Path file) method [3] and so on until it 
ends up in returning the FilesStatus instance [4]. The resource leakage occurs 
in the method getWorkingDirectory which calls the getHomeDirectory method [5] 
which in turn establishes a new connection to the sftp server instead of using 
an already created connection. As the mkdirs method is recursive this results 
in creating a huge number of connections.
 # open method [6]   This method returns an instance of FSDataInputStream which 
consumes SFTPInputStream instance which doesn't return an acquired ChannelSftp 
instance back to the pool but instead it closes it[7]. This leads to 
establishing another connection to an SFTP server when the next method is 
called on the FileSystem instance.


[1] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L658

[2] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L321

[3] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L202

[4] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L290

[5] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L640

[6] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPFileSystem.java#L504

[7] 
https://github.com/apache/hadoop/blob/736ceab2f58fb9ab5907c5b5110bd44384038e6b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/sftp/SFTPInputStream.java#L123



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org