Aron Stansvik wrote:
2007/2/20, Phil Knirsch <[EMAIL PROTECTED]>:
Aron Stansvik wrote:
> Hello fellow urlgrabbers, is this intentional behavior?
>
> $ urlgrabber ftp://user:[EMAIL PROTECTED]/non_existing_file
> non_existing_file                                                  0 B
> 00:00
> file written to non_existing_file
>
> It happily saves a 0 byte file when I try to fetch a file that isn't
> there. Shouldn't it fail on me? How can I make it file when the file
> is not present on the server?
>
> Best regards,
> Aron Stansvik

Funny enough, i stumbled accross the exact same problem yesterday. I'm
currently debugging the cause and i'm down to urllib.py which makes the
same mistake and ftlib.py which works correctly and fails with an error 550.

Yep. I'm guessing bullet point number 7 under "Restrictions" at the
bottom of http://docs.python.org/lib/module-urllib.html might be the
culprit, but I'm not sure.

For now I'm using ftplib directly in my script, which works but is not
as nice as urllib or urlgrabber when all you want is to download a
file.

Hopefully i'll have a patch ready tonight. I'll send it upstream to
python then, so it might take some more time until that finds it's way
down again.

Great! Thanks for the fast response.

Regards,
Aron


OK, fix is done and submitted to python upstream. The problem came from the urrlib.ftpwrapper.retrfile() method. Here my comments on the python entry:

When trying to retrieve a none existing file using the urllib.ftpwrapper.retrfile() method the behaviour is that instead of an error message a valid hook is returned and you will recieve a 0 byte file.

The current behaviour tries to emulate what one typically sees with http servers and DirIndexes, which means:

1) Try to RETR the file.
2) If that fails, assume it is a directory and LIST it.

Unfortunately it doesn't actually check whether the directory actually
exists.

The attached patch fixes this by remembering the current directory using the PWD command, then temporarily change to that directory and switch back to the previous working directory if it was successfull.

If not we raise an IO error, as the file could neither be opened (RETR) nor was it a directory.

That way the behaviour is even closer to what happens with http servers where we get a 404 when we try to access a none existing file or directory.

Storing the current directory and switching back to it in case of no error will also put the connection back in the proper state and directory, so no unexpected behaviour happens here.

The patch is against the current SVN repository at revision 53833.

Read ya, Phil

PS: Attached said patch.

--
Philipp Knirsch      | Tel.:  +49-711-96437-470
Development          | Fax.:  +49-711-96437-111
Red Hat GmbH         | Email: Phil Knirsch <[EMAIL PROTECTED]>
Hauptstaetterstr. 58 | Web:   http://www.redhat.de/
D-70178 Stuttgart
Motd:  You're only jealous cos the little penguins are talking to me.
--- urllib.py.old	2007-02-20 18:16:34.000000000 +0100
+++ urllib.py	2007-02-20 18:16:13.000000000 +0100
@@ -866,8 +866,15 @@
         if not conn:
             # Set transfer mode to ASCII!
             self.ftp.voidcmd('TYPE A')
-            # Try a directory listing
-            if file: cmd = 'LIST ' + file
+            # Try a directory listing. Verify that directory exists.
+            if file:
+                pwd = self.ftp.pwd()
+                try:
+                    self.ftp.cwd(file)
+                except ftplib.error_perm, reason:
+                    raise IOError, reason, sys.exc_info()[2]
+                self.ftp.cwd(pwd)
+                cmd = 'LIST ' + file
             else: cmd = 'LIST'
             conn = self.ftp.ntransfercmd(cmd)
         self.busy = 1
_______________________________________________
Yum-devel mailing list
[email protected]
https://lists.dulug.duke.edu/mailman/listinfo/yum-devel

Reply via email to