Re: Speed ain't bad
John Machin [EMAIL PROTECTED] wrote: 1. Robustness: Both versions will crash (in the sense of an unhandled 2. Efficiency: I don't see the disk I/O inefficiency in calling 3. Don't itemise perceived flaws in other people's postings. It may give off a hostile impression. 1. Robustness: Both versions will crash (in the sense of an unhandled exception) in the situation where zfdir exists but is not a directory. The revised version just crashes later than the OP's version :-( Trapping EnvironmentError seems not very useful -- the result will not distinguish (on Windows 2000 at least) between the 'existing dir' and 'existing non-directory' cases. Good point; my version has room for improvement. But at least it fixes the race condition between isdir and makedirs. What I like about EnvironmentError is that it it's easier to use than figuring out which one of IOError or OSError applies (and whether that can be relied on, cross-platform). 2. Efficiency: I don't see the disk I/O inefficiency in calling os.path.isdir() before os.makedirs() -- if the relevant part of the filesystem wasn't already in memory, the isdir() call would make it so, and makedirs() would get a free ride, yes/no? Perhaps. Looking stuff up in operating system tables and buffers takes time too. And then there's network latency; how much local caching do you get for an NFS mount or SMB share? If you really want to know, measure. - Anders -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed ain't bad
Anders J. Munch wrote: Another way is the strategy of it's easier to ask forgiveness than to ask permission. If you replace: if(not os.path.isdir(zfdir)): os.makedirs(zfdir) with: try: os.makedirs(zfdir) except EnvironmentError: pass then not only will your script become a micron more robust, but assuming zfdir typically does not exist, you will have saved the call to os.path.isdir. ... at the cost of an exception frame setup and an incomplete call to os.makedirs(). It's an open question whether the exception setup and recovery take less time than the call to isdir(), though I'd expect probably not. The exception route definitely makes more sense if the makedirs() call is likely to succeed; if it's likely to fail, then things are murkier. Since isdir() *is* a disk i/o operation, then in this case the exception route is probably preferable anyhow. In either case, one must touch the disk; in the exception case, there will only ever be one disk access (which either succeeds or fails), while in the other case, there may be two disk accesses. However, if it wasn't for the extra disk i/o operation, then the 'if ...' might be slightly faster, even though the exception-based route is more Pythonic. Jeff Shannon Technician/Programmer Credit International -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed ain't bad
Anders J. Munch wrote: Another way is the strategy of it's easier to ask forgiveness than to ask permission. If you replace: if(not os.path.isdir(zfdir)): os.makedirs(zfdir) with: try: os.makedirs(zfdir) except EnvironmentError: pass then not only will your script become a micron more robust, but assuming zfdir typically does not exist, you will have saved the call to os.path.isdir. 1. Robustness: Both versions will crash (in the sense of an unhandled exception) in the situation where zfdir exists but is not a directory. The revised version just crashes later than the OP's version :-( Trapping EnvironmentError seems not very useful -- the result will not distinguish (on Windows 2000 at least) between the 'existing dir' and 'existing non-directory' cases. Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on win32 import os, os.path os.path.exists('fubar_not_dir') True os.path.isdir('fubar_not_dir') False os.makedirs('fubar_not_dir') Traceback (most recent call last): File stdin, line 1, in ? File c:\Python24\lib\os.py, line 159, in makedirs mkdir(name, mode) OSError: [Errno 17] File exists: 'fubar_not_dir' try: ...os.mkdir('fubar_not_dir') ... except EnvironmentError: ...print 'trapped env err' ... trapped env err os.mkdir('fubar_is_dir') os.mkdir('fubar_is_dir') Traceback (most recent call last): File stdin, line 1, in ? OSError: [Errno 17] File exists: 'fubar_is_dir' 2. Efficiency: I don't see the disk I/O inefficiency in calling os.path.isdir() before os.makedirs() -- if the relevant part of the filesystem wasn't already in memory, the isdir() call would make it so, and makedirs() would get a free ride, yes/no? -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed ain't bad
On Sat, 1 Jan 2005 14:20:06 +0100, Anders J. Munch [EMAIL PROTECTED] wrote: One of the posters inspired me to do profiling on my newbie script (pasted below). After measurements I have found that the speed of Python, at least in the area where my script works, is surprisingly high. Pretty good code for someone who calls himself a newbie. blush One line that puzzles me: sfile=open(sfpath,'rb') You never use sfile again. Right! It's a leftover from a previous implementation (that used bzip2). Forgot to delete it, thanks. Another way is the strategy of it's easier to ask forgiveness than to ask permission. If you replace: if(not os.path.isdir(zfdir)): os.makedirs(zfdir) with: try: os.makedirs(zfdir) except EnvironmentError: pass then not only will your script become a micron more robust, but assuming zfdir typically does not exist, you will have saved the call to os.path.isdir. Yes, this is the kind of habit that low-level languages like C missing features like exceptions ingrain in a mind of a programmer... Getting out of this straitjacket is kind of hard - it would not cross my mind to try smth like what you showed me, thanks! Exceptions in Python are a GODSEND. I strongly recommend to any former C programmer wanting to get rid of a straightjacket to read the following to get an idea how not to write C code in Python and instead exploit the better side of VHLL: http://gnosis.cx/TPiP/appendix_a.txt -- It's a man's life in a Python Programming Association. -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed ain't bad
Bulba! [EMAIL PROTECTED] wrote: One of the posters inspired me to do profiling on my newbie script (pasted below). After measurements I have found that the speed of Python, at least in the area where my script works, is surprisingly high. Pretty good code for someone who calls himself a newbie. One line that puzzles me: sfile=open(sfpath,'rb') You never use sfile again. In any case, you should explicitly close all files that you open. Even if there's an exception: sfile = open(sfpath, 'rb') try: stuff to do with the file open finally: sfile.close() The only thing I'm missing in this picture is knowledge if my script could be further optimised (not that I actually need better performance, I'm just curious what possible solutions could be). Any takers among the experienced guys? Basically the way to optimise these things is to cut down on anything that does I/O: Use as few calls to os.path.is{dir,file}, os.stat, open and such that you can get away with. One way to do that is caching; e.g. storing names of known directories in a set (sets.Set()) and checking that set before calling os.path.isdir. I haven't spotted any obvious opportunities for that in your script, though. Another way is the strategy of it's easier to ask forgiveness than to ask permission. If you replace: if(not os.path.isdir(zfdir)): os.makedirs(zfdir) with: try: os.makedirs(zfdir) except EnvironmentError: pass then not only will your script become a micron more robust, but assuming zfdir typically does not exist, you will have saved the call to os.path.isdir. - Anders -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed ain't bad
On Fri, 2004-12-31 at 11:17, Jeremy Bowers wrote: I would point out a couple of other ideas, though you may be aware of them: Compressing all the files seperately, if they are small, may greatly reduce the final compression since similarities between the files can not be exploited. True; however, it's my understanding that compressing individual files also means that in the case of damage to the archive it is possible to recover the files after the damaged file. This cannot be guaranteed when the archive is compressed as a single stream. -- Craig Ringer -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed ain't bad
Craig Ringer wrote: On Fri, 2004-12-31 at 11:17, Jeremy Bowers wrote: I would point out a couple of other ideas, though you may be aware of them: Compressing all the files seperately, if they are small, may greatly reduce the final compression since similarities between the files can not be exploited. True; however, it's my understanding that compressing individual files also means that in the case of damage to the archive it is possible to recover the files after the damaged file. This cannot be guaranteed when the archive is compressed as a single stream. With gzip, you can forget the entire rest of the stream; with bzip2, there is a good chance that nothing more than one block (100-900k) is lost. regards, Reinhold -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed ain't bad
On Fri, 31 Dec 2004 13:19:44 +0100, Reinhold Birkenfeld [EMAIL PROTECTED] wrote: True; however, it's my understanding that compressing individual files also means that in the case of damage to the archive it is possible to recover the files after the damaged file. This cannot be guaranteed when the archive is compressed as a single stream. With gzip, you can forget the entire rest of the stream; with bzip2, there is a good chance that nothing more than one block (100-900k) is lost. I have actually written the version of that script with bzip2 but it was so horribly slow that I chose the zip version. -- It's a man's life in a Python Programming Association. -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed ain't bad
On Thu, 30 Dec 2004 22:17:10 -0500, Jeremy Bowers [EMAIL PROTECTED] wrote: I would point out a couple of other ideas, though you may be aware of them: Compressing all the files seperately, if they are small, may greatly reduce the final compression since similarities between the files can not be exploited. You may not care. The problem is about easy recovery of individual files plus storing and not deleting the older versions of files for some time (users of the file servers tend to come around crying I have deleted this important file created a week before accidentally, where can I find it?). The way it is done I can expose the directory hierarchy as read-only to users and they can get the damn file themselves, they just need to unzip it. If they were to search through a huge zipfile to find it, that could be a problem for them. -- It's a man's life in a Python Programming Association. -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed ain't bad
Bulba! [EMAIL PROTECTED] writes: The only thing I'm missing in this picture is knowledge if my script could be further optimised (not that I actually need better performance, I'm just curious what possible solutions could be). Any takers among the experienced guys? There's another compression program called LHN which is supposed to be quite a bit faster than gzip, though with somewhat worse compression. I haven't gotten around to trying it. -- http://mail.python.org/mailman/listinfo/python-list
Speed ain't bad
One of the posters inspired me to do profiling on my newbie script (pasted below). After measurements I have found that the speed of Python, at least in the area where my script works, is surprisingly high. This is the experiment: a script recreates the folder hierarchy somewhere else and stores there the compressed versions of files from source hierarchy (the script is doing additional backups of the disk of file server at the company where I work onto other disks, with compression for sake of saving space). The data was: 468 MB, 15057 files, 1568 folders (machine: win2k, python v2.3.3) The time that WinRAR v3.20 (with ZIP format and normal compression set) needed to compress all that was 119 seconds. The Python script time (running under profiler) was, drumroll... 198 seconds. Note that the Python script had to laboriously recreate the tree of 1568 folders and create over 15 thousand compressed files, so it had more work to do actually than WinRAR did. The size of compressed data was basically the same, about 207 MB. I find it very encouraging that in the real world area of application a newbie script written in the very high-level language can have the performance that is not that far from the performance of shrinkwrap pro archiver (WinRAR is excellent archiver, both when it comes to compression as well as speed). I do realize that this is mainly the result of all the underlying infrastructure of Python. Great work, guys. Congrats. The only thing I'm missing in this picture is knowledge if my script could be further optimised (not that I actually need better performance, I'm just curious what possible solutions could be). Any takers among the experienced guys? Profiling results: p3.sort_stats('cumulative').print_stats(40) Fri Dec 31 01:04:14 2004p3.tmp 580543 function calls (568607 primitive calls) in 198.124 CPU seconds Ordered by: cumulative time List reduced from 69 to 40 due to restriction 40 ncalls tottime percall cumtime percall filename:lineno(function) 10.0130.013 198.124 198.124 profile:0(z3()) 10.0000.000 198.110 198.110 string:1(?) 10.0000.000 198.110 198.110 interactive input:1(z3) 11.5131.513 198.110 198.110 zmtree3.py:26(zmtree) 15057 14.5040.001 186.9610.012 zmtree3.py:7(zf) 15057 147.5820.010 148.7780.010 C:\Python23\lib\zipfile.py:388(write) 15057 12.1560.001 12.1560.001 C:\Python23\lib\zipfile.py:182(__init__) 320027.9570.0008.5420.000 C:\PYTHON23\Lib\ntpath.py:266(isdir) 13826/18902.5500.0008.1430.004 C:\Python23\lib\os.py:206(walk) 301143.1640.0003.1640.000 C:\Python23\lib\zipfile.py:483(close) 602281.7530.0002.1490.000 C:\PYTHON23\Lib\ntpath.py:157(split) 451710.5380.0002.1160.000 C:\PYTHON23\Lib\ntpath.py:197(basename) 150571.2850.0001.9170.000 C:\PYTHON23\Lib\ntpath.py:467(abspath) 338900.6880.0001.4190.000 C:\PYTHON23\Lib\ntpath.py:58(join) 1091750.7830.0000.7830.000 C:\PYTHON23\Lib\ntpath.py:115(splitdrive) 150570.1960.0000.7680.000 C:\PYTHON23\Lib\ntpath.py:204(dirname) 338900.4330.0000.7310.000 C:\PYTHON23\Lib\ntpath.py:50(isabs) 150570.5440.0000.6320.000 C:\PYTHON23\Lib\ntpath.py:438(normpath) 320020.4310.0000.5850.000 C:\PYTHON23\Lib\stat.py:45(S_ISDIR) 150570.5550.0000.5550.000 C:\Python23\lib\zipfile.py:149(FileHeader) 150570.4830.0000.4830.000 C:\Python23\lib\zipfile.py:116(__init__) 1510.0020.0000.4350.003 C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:171(write) 1510.0020.0000.4320.003 C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:489(write) 1510.0130.0000.4300.003 C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:461(HandleOutput) 760.0870.0010.4050.005 C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:430(QueueFlush) 150570.2390.0000.3400.000 C:\Python23\lib\zipfile.py:479(__del__) 150570.1570.0000.1570.000 C:\Python23\lib\zipfile.py:371(_writecheck) 320020.1540.0000.1540.000 C:\PYTHON23\Lib\stat.py:29(S_IFMT) 760.0070.0000.1460.002 C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:262(dowrite) 760.0070.0000.1370.002 C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\formatter.py:221(OnStyleNeeded) 760.0110.0000.1180.002 C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:197(Colorize) 760.1100.0010.1120.001 C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:69(SCIInsertText)
Re: Speed ain't bad
On Fri, 31 Dec 2004 01:41:13 +0100, Bulba! wrote: One of the posters inspired me to do profiling on my newbie script (pasted below). After measurements I have found that the speed of Python, at least in the area where my script works, is surprisingly high. This is the experiment: a script recreates the folder hierarchy somewhere else and stores there the compressed versions of files from source hierarchy (the script is doing additional backups of the disk of file server at the company where I work onto other disks, with compression for sake of saving space). The data was: I did not study your script but odds are it is strongly disk bound. This means that the disk access time is so large that it completely swamps almost everything else. I would point out a couple of other ideas, though you may be aware of them: Compressing all the files seperately, if they are small, may greatly reduce the final compression since similarities between the files can not be exploited. You may not care. Also, the zip format can be updated on a file-by-file basis; it may do all by itself what you are trying to do, with just a single command line. Just a thought. -- http://mail.python.org/mailman/listinfo/python-list