[issue27973] urllib.urlretrieve() fails on second ftp transfer

2016-11-10 Thread Sohaib Ahmad

Sohaib Ahmad added the comment:

@Senthil, thanks for looking into this.

Looking forward to your commit.

Regards.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27973] urllib.urlretrieve() fails on second ftp transfer

2016-11-04 Thread Sohaib Ahmad

Sohaib Ahmad added the comment:

Can someone please review this patch so that it would be in 2.7.13 when it 
comes out?

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27973] urllib.urlretrieve() fails on second ftp transfer

2016-09-17 Thread Sohaib Ahmad

Sohaib Ahmad added the comment:

I finally found the actual problem causing the failure of second download. 
urlretrieve() works with FTP in PASV mode, and in PASV mode after sending the 
file to client, the FTP server sends an ACK that the file has been transferred. 
After the fix of issue1067702 socket was being closed without receiving this 
ACK.

Now, when a user tries to download the same file or another file from same 
directory, the key (host, port, dirs) remains the same so open_ftp() skips ftp 
initialization. Because of this skipping, previous FTP connection is reused and 
when new commands are sent to the server, server first sends the previous ACK. 
This causes a domino effect and each response gets delayed by one and we get an 
exception from parse227().

Expected response:
*cmd* 'RETR Contents-udeb-ppc64el.gz'
*resp* '150 Opening BINARY mode data connection for 
Contents-udeb-ppc64el.gz (26555 bytes).'
*resp* '226 Transfer complete.'

*cmd* 'TYPE I'
*resp* '200 Switching to Binary mode.'
*cmd* 'PASV'
*resp* '227 Entering Passive Mode (130,239,18,173,137,59).'

Actual response:
*cmd* 'RETR Contents-udeb-ppc64el.gz'
*resp* '150 Opening BINARY mode data connection for 
Contents-udeb-ppc64el.gz (26555 bytes).'

*cmd* 'TYPE I'
*resp* '226 Transfer complete.'
*cmd* 'PASV'
*resp* '200 Switching to Binary mode.'

I am attaching a new patch (urllib.patch) which fixes this problem by clearing 
the FTP server responses first if an existing connection is being used to 
download a file. Please review and let me know if it looks good.

--
Added file: http://bugs.python.org/file44712/urllib.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27973] urllib.urlretrieve() fails on second ftp transfer

2016-09-17 Thread Sohaib Ahmad

Changes by Sohaib Ahmad <sohaib0...@gmail.com>:


Removed file: http://bugs.python.org/file44692/urllib.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27973] urllib.urlretrieve() fails on second ftp transfer

2016-09-16 Thread Sohaib Ahmad

Sohaib Ahmad added the comment:

Hi Senthil,

Thanks for the review. Now that I look at it, even with a default value, an ftp 
specific parameter sure does break the open() API abstraction.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27973] urllib.urlretrieve() fails on second ftp transfer

2016-09-16 Thread Sohaib Ahmad

Sohaib Ahmad added the comment:

The attached patch fixes the problem with multiple ftp downloads while keeping 
the fix for issue1067702 intact.

The fix basically uses a new parameter ftp_retrieve to change the behavior of 
ftpwrapper.retrfile() if it is being called by urlretrieve().

I am not familiar with the process of contributing a patch in Python repo so 
please review and commit the attached urllib.patch file.

Tested with urlopen (https, http, ftp) and urlretrieve (ftp).

--
keywords: +patch
Added file: http://bugs.python.org/file44692/urllib.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27973] urllib.urlretrieve() fails on second ftp transfer

2016-09-15 Thread Sohaib Ahmad

Sohaib Ahmad added the comment:

I manually reverted the issue26960 patch which fixed my issue of consecutive 
downloads but it also caused regression of issue26960.

I am looking into what could be causing this hang when voidresp() is called 
using the demo available in issue26960 and it looks when urlopen() is called 
following happens:

urlopen() > URLopener.open() > URLopener.open_ftp > ftpwrapper.retrfile() > 
FTP.ntransfercmd()

Now this retrfile() calls FTP.ntransfercmd() in ftplib which sends RETR command 
to ftp server which, if I understand correctly, means that retrieve a copy of 
the file from FTP server. If RETR does retrieve complete file then I think the 
behavior after reverting issue26960 patch is fine and the hang would be there 
for large files.

I think we can fix this freeze for large files but I have two questions 
regarding this:

1) Is urlopen() supposed to download complete files? From Python doc, it looks 
like it only returns a network object or an exception in case of invalid URL.

2) If it is not supposed to download complete files, can we switch to LIST 
instead of RETR for FTP files?

I'd be grateful if a urllib / ftplib expert can answer the above questions.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27973] urllib.urlretrieve() fails on second ftp transfer

2016-09-15 Thread Sohaib Ahmad

Sohaib Ahmad added the comment:

I didn't know that urllib.urlopen() retrieves complete object in case of FTP. 
When getresp() is called for big files (the one in issue26960), RETR command is 
initiated and server returns code 150 which means "standby for another reply" 
and there is where the control got stuck and issue26960 was reported.

This is the end of debug log with the file mentioned in issue26960, after which 
the control got stuck:

*cmd* 'TYPE I'
*put* 'TYPE I\r\n'
*get* '200 Type set to I\r\n'
*resp* '200 Type set to I'
*cmd* 'PASV'
*put* 'PASV\r\n'
*get* '227 Entering Passive Mode (130,133,3,130,207,26).\r\n'
*resp* '227 Entering Passive Mode (130,133,3,130,207,26).'
*cmd* 'RETR ratings.list.gz'
*put* 'RETR ratings.list.gz\r\n'
*get* '150 Opening BINARY mode data connection for ratings.list.gz (12643237 
bytes)\r\n'
*resp* '150 Opening BINARY mode data connection for ratings.list.gz (12643237 
bytes)'

And this is the end of debug log of a very small file transfer over FTP:

*cmd* 'PASV'
*put* 'PASV\r\n'
*get* '227 Entering Passive Mode (130,239,18,165,234,243).\r\n'
*resp* '227 Entering Passive Mode (130,239,18,165,234,243).'
*cmd* 'RETR Contents-udeb-ppc64el.gz'
*put* 'RETR Contents-udeb-ppc64el.gz\r\n'
*get* '150 Opening BINARY mode data connection for Contents-udeb-ppc64el.gz 
(26555 bytes).\r\n'
*resp* '150 Opening BINARY mode data connection for Contents-udeb-ppc64el.gz 
(26555 bytes).'
*get* '226 Transfer complete.\r\n'
*resp* '226 Transfer complete.'

The control returned successfully once FTP returned 2xx.

Please correct me if I am wrong but from the RETR command it looks like it is 
trying the retrieve the whole file in both cases. Is urlopen() supposed to 
retrieve files when called or just get the headers/information etc.?

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27973] urllib.urlretrieve() fails on second ftp transfer

2016-09-15 Thread Sohaib Ahmad

Sohaib Ahmad added the comment:

Thank you for pointing me towards hg bisect. I got some time to look into it 
and was able to find the commit that broke this functionality.

A fix from Python 3 was backported in issue "urllib hangs when closing 
connection" which removed a call to ftp.voidresp(). Without this call the 
second download using urlretrieve() now fails in 2.7.12.

Issue ID:
http://bugs.python.org/issue26960

Commit ID:
https://hg.python.org/cpython/rev/44d02a5d59fb

voidresp() itself calls getresp(). So issue26960 could be because control never 
returns from getresp().

In my opinion this commit (101286) should be reverted and getresp() should be 
updated with some sort of timeout to fix issue26960.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27973] urllib.urlretrieve() fails on second ftp transfer

2016-09-07 Thread Sohaib Ahmad

Sohaib Ahmad added the comment:

I am not much familiar with mercurial. I will try to setup the development 
environment.

Traceback is:

[Errno ftp error] 200 Switching to Binary mode.
Traceback (most recent call last):
  File "multiple_ftp_download.py", line 49, in main
file2_path = download_from_url(url2, local_folder=tmpDir)
  File "multiple_ftp_download.py", line 32, in download_from_url
filename = urllib.urlretrieve(url, local_path)[0]
  File "C:\Python27\lib\urllib.py", line 98, in urlretrieve
return opener.retrieve(url, filename, reporthook, data)
  File "C:\Python27\lib\urllib.py", line 245, in retrieve
fp = self.open(url, data)
  File "C:\Python27\lib\urllib.py", line 213, in open
return getattr(self, name)(url)
  File "C:\Python27\lib\urllib.py", line 558, in open_ftp
(fp, retrlen) = self.ftpcache[key].retrfile(file, type)
  File "C:\Python27\lib\urllib.py", line 906, in retrfile
conn, retrlen = self.ftp.ntransfercmd(cmd)
  File "C:\Python27\lib\ftplib.py", line 334, in ntransfercmd
host, port = self.makepasv()
  File "C:\Python27\lib\ftplib.py", line 312, in makepasv
host, port = parse227(self.sendcmd('PASV'))
  File "C:\Python27\lib\ftplib.py", line 830, in parse227
raise error_reply, resp
IOError: [Errno ftp error] 200 Switching to Binary mode.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27973] urllib.urlretrieve() fails on second ftp transfer

2016-09-06 Thread Sohaib Ahmad

New submission from Sohaib Ahmad:

urllib.urlretrieve() fails on ftp:
- start and complete a transfer
- immediately start another transfer
The second transfer will fail with the following error:
[Errno ftp error] 200 Type set to I

I am using urllib.urlretrieve(url, filename) to retrieve two files (one by one) 
from FTP server.

Sample code to reproduce the problem is attached. Please update url1 and url2 
with correct values.

This problem was reported several years ago and was fixed but it is now 
reproducible on latest python 2.7 package (2.7.12).

http://bugs.python.org/issue1067702

I tried the same scenario on 2.7.10 and it worked fine. So a patch after 2.7.10 
must have broken something.

--
components: Library (Lib)
files: multiple_ftp_download.py
messages: 274559
nosy: Sohaib Ahmad
priority: normal
severity: normal
status: open
title: urllib.urlretrieve() fails on second ftp transfer
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file44396/multiple_ftp_download.py

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27973>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1067702] urllib fails with multiple ftp transfers

2016-09-06 Thread Sohaib Ahmad

Sohaib Ahmad added the comment:

The problem is reproducible on latest python 2.7 package (2.7.12).

I tried the same scenario on 2.7.10 and it worked fine. I am not sure if this 
issue can be reopened or should I create a new one?

In my case first transfer succeeds but second ftp transfer fails with the error:
[Errno ftp error] 200 Type set to I

I am using urllib.urlretrieve(url, local_path) to retrieve two files (one by 
one) from FTP server.

--
nosy: +Sohaib Ahmad

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1067702>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com