Re: Speed ain't bad

2005-01-04 Thread Anders J. Munch
John Machin [EMAIL PROTECTED] wrote:
 1. Robustness: Both versions will crash (in the sense of an unhandled
 2. Efficiency: I don't see the disk I/O inefficiency in calling

3. Don't itemise perceived flaws in other people's postings. It may
give off a hostile impression.

 1. Robustness: Both versions will crash (in the sense of an unhandled
 exception) in the situation where zfdir exists but is not a directory.
 The revised version just crashes later than the OP's version :-(
 Trapping EnvironmentError seems not very useful -- the result will not
 distinguish (on Windows 2000 at least) between the 'existing dir' and
 'existing non-directory' cases.

Good point; my version has room for improvement. But at least it fixes
the race condition between isdir and makedirs.

What I like about EnvironmentError is that it it's easier to use than
figuring out which one of IOError or OSError applies (and whether that
can be relied on, cross-platform).

 2. Efficiency: I don't see the disk I/O inefficiency in calling
 os.path.isdir() before os.makedirs() -- if the relevant part of the
 filesystem wasn't already in memory, the isdir() call would make it
 so, and makedirs() would get a free ride, yes/no?

Perhaps. Looking stuff up in operating system tables and buffers takes
time too. And then there's network latency; how much local caching do
you get for an NFS mount or SMB share?

If you really want to know, measure.

- Anders


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed ain't bad

2005-01-03 Thread Jeff Shannon
Anders J. Munch wrote:
Another way is the strategy of it's easier to ask forgiveness than to
ask permission.
If you replace:
if(not os.path.isdir(zfdir)):
os.makedirs(zfdir)
with:
try:
os.makedirs(zfdir)
except EnvironmentError:
pass
then not only will your script become a micron more robust, but
assuming zfdir typically does not exist, you will have saved the call
to os.path.isdir.
... at the cost of an exception frame setup and an incomplete call to 
os.makedirs().  It's an open question whether the exception setup and 
recovery take less time than the call to isdir(), though I'd expect 
probably not.  The exception route definitely makes more sense if the 
makedirs() call is likely to succeed; if it's likely to fail, then 
things are murkier.

Since isdir() *is* a disk i/o operation, then in this case the 
exception route is probably preferable anyhow.  In either case, one 
must touch the disk; in the exception case, there will only ever be 
one disk access (which either succeeds or fails), while in the other 
case, there may be two disk accesses.  However, if it wasn't for the 
extra disk i/o operation, then the 'if ...' might be slightly faster, 
even though the exception-based route is more Pythonic.

Jeff Shannon
Technician/Programmer
Credit International
--
http://mail.python.org/mailman/listinfo/python-list


Re: Speed ain't bad

2005-01-03 Thread John Machin
Anders J. Munch wrote:
 Another way is the strategy of it's easier to ask forgiveness than
to
 ask permission.
 If you replace:
 if(not os.path.isdir(zfdir)):
 os.makedirs(zfdir)
 with:
 try:
 os.makedirs(zfdir)
 except EnvironmentError:
 pass

 then not only will your script become a micron more robust, but
 assuming zfdir typically does not exist, you will have saved the call
 to os.path.isdir.

1. Robustness: Both versions will crash (in the sense of an unhandled
exception) in the situation where zfdir exists but is not a directory.
The revised version just crashes later than the OP's version :-(
Trapping EnvironmentError seems not very useful -- the result will not
distinguish (on Windows 2000 at least) between the 'existing dir' and
'existing non-directory' cases.


Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
 import os, os.path
 os.path.exists('fubar_not_dir')
True
 os.path.isdir('fubar_not_dir')
False
 os.makedirs('fubar_not_dir')
Traceback (most recent call last):
File stdin, line 1, in ?
File c:\Python24\lib\os.py, line 159, in makedirs
mkdir(name, mode)
OSError: [Errno 17] File exists: 'fubar_not_dir'
 try:
...os.mkdir('fubar_not_dir')
... except EnvironmentError:
...print 'trapped env err'
...
trapped env err
 os.mkdir('fubar_is_dir')
 os.mkdir('fubar_is_dir')
Traceback (most recent call last):
File stdin, line 1, in ?
OSError: [Errno 17] File exists: 'fubar_is_dir'


2. Efficiency: I don't see the disk I/O inefficiency in calling
os.path.isdir() before os.makedirs() -- if the relevant part of the
filesystem wasn't already in memory, the isdir() call would make it so,
and makedirs() would get a free ride, yes/no?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed ain't bad

2005-01-02 Thread Bulba!
On Sat, 1 Jan 2005 14:20:06 +0100, Anders J. Munch
[EMAIL PROTECTED] wrote:

 One of the posters inspired me to do profiling on my newbie script
 (pasted below). After measurements I have found that the speed
 of Python, at least in the area where my script works, is surprisingly
 high.

Pretty good code for someone who calls himself a newbie.

blush

One line that puzzles me:
 sfile=open(sfpath,'rb')

You never use sfile again.

Right! It's a leftover from a previous implementation (that
used bzip2). Forgot to delete it, thanks.

Another way is the strategy of it's easier to ask forgiveness than to
ask permission.
If you replace:
if(not os.path.isdir(zfdir)):
os.makedirs(zfdir)
with:
try:
os.makedirs(zfdir)
except EnvironmentError:
pass

then not only will your script become a micron more robust, but
assuming zfdir typically does not exist, you will have saved the call
to os.path.isdir.

Yes, this is the kind of habit that low-level languages like C 
missing features like exceptions ingrain in a mind of a programmer...

Getting out of this straitjacket is kind of hard - it would not cross
my mind to try smth like what you showed me, thanks!

Exceptions in Python are a GODSEND. I strongly recommend
to any former C programmer wanting to get rid of a straightjacket
to read the following to get an idea how not to write C code in Python
and instead exploit the better side of VHLL:

http://gnosis.cx/TPiP/appendix_a.txt




--
It's a man's life in a Python Programming Association.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed ain't bad

2005-01-01 Thread Anders J. Munch
Bulba! [EMAIL PROTECTED] wrote:

 One of the posters inspired me to do profiling on my newbie script
 (pasted below). After measurements I have found that the speed
 of Python, at least in the area where my script works, is surprisingly
 high.

Pretty good code for someone who calls himself a newbie.

One line that puzzles me:
 sfile=open(sfpath,'rb')

You never use sfile again.
In any case, you should explicitly close all files that you open. Even
if there's an exception:

sfile = open(sfpath, 'rb')
try:
stuff to do with the file open
finally:
sfile.close()


 The only thing I'm missing in this picture is knowledge if my script
 could be further optimised (not that I actually need better
 performance, I'm just curious what possible solutions could be).

 Any takers among the experienced guys?

Basically the way to optimise these things is to cut down on anything
that does I/O: Use as few calls to os.path.is{dir,file}, os.stat, open
and such that you can get away with.

One way to do that is caching; e.g. storing names of known directories
in a set (sets.Set()) and checking that set before calling
os.path.isdir.  I haven't spotted any obvious opportunities for that
in your script, though.

Another way is the strategy of it's easier to ask forgiveness than to
ask permission.
If you replace:
if(not os.path.isdir(zfdir)):
os.makedirs(zfdir)
with:
try:
os.makedirs(zfdir)
except EnvironmentError:
pass

then not only will your script become a micron more robust, but
assuming zfdir typically does not exist, you will have saved the call
to os.path.isdir.

- Anders


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed ain't bad

2004-12-31 Thread Craig Ringer
On Fri, 2004-12-31 at 11:17, Jeremy Bowers wrote:

 I would point out a couple of other ideas, though you may be aware of
 them: Compressing all the files seperately, if they are small, may greatly
 reduce the final compression since similarities between the files can not
 be exploited.

True; however, it's my understanding that compressing individual files
also means that in the case of damage to the archive it is possible to
recover the files after the damaged file. This cannot be guaranteed when
the archive is compressed as a single stream.

--
Craig Ringer

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed ain't bad

2004-12-31 Thread Reinhold Birkenfeld
Craig Ringer wrote:
 On Fri, 2004-12-31 at 11:17, Jeremy Bowers wrote:
 
 I would point out a couple of other ideas, though you may be aware of
 them: Compressing all the files seperately, if they are small, may greatly
 reduce the final compression since similarities between the files can not
 be exploited.
 
 True; however, it's my understanding that compressing individual files
 also means that in the case of damage to the archive it is possible to
 recover the files after the damaged file. This cannot be guaranteed when
 the archive is compressed as a single stream.

With gzip, you can forget the entire rest of the stream; with bzip2,
there is a good chance that nothing more than one block (100-900k) is lost.

regards,
Reinhold
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed ain't bad

2004-12-31 Thread Bulba!
On Fri, 31 Dec 2004 13:19:44 +0100, Reinhold Birkenfeld
[EMAIL PROTECTED] wrote:

 True; however, it's my understanding that compressing individual files
 also means that in the case of damage to the archive it is possible to
 recover the files after the damaged file. This cannot be guaranteed when
 the archive is compressed as a single stream.

With gzip, you can forget the entire rest of the stream; with bzip2,
there is a good chance that nothing more than one block (100-900k) is lost.

I have actually written the version of that script with bzip2 but 
it was so horribly slow that I chose the zip version.






--
It's a man's life in a Python Programming Association.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed ain't bad

2004-12-31 Thread Bulba!
On Thu, 30 Dec 2004 22:17:10 -0500, Jeremy Bowers [EMAIL PROTECTED]
wrote:

I would point out a couple of other ideas, though you may be aware of
them: Compressing all the files seperately, if they are small, may greatly
reduce the final compression since similarities between the files can not
be exploited. You may not care. 

The problem is about easy recovery of individual files plus storing 
and not deleting the older versions of files for some time (users
of the file servers tend to come around crying I have deleted this
important file created a week before accidentally, where can I find
it?).

The way it is done I can expose the directory hierarchy as read-only
to users and they can get the damn file themselves, they just need 
to unzip it. If they were to search through a huge zipfile to find 
it, that could be a problem for them. 




--
It's a man's life in a Python Programming Association.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed ain't bad

2004-12-31 Thread Paul Rubin
Bulba! [EMAIL PROTECTED] writes:
 The only thing I'm missing in this picture is knowledge if my script
 could be further optimised (not that I actually need better
 performance, I'm just curious what possible solutions could be). 
 
 Any takers among the experienced guys?

There's another compression program called LHN which is supposed to be
quite a bit faster than gzip, though with somewhat worse compression.
I haven't gotten around to trying it.
-- 
http://mail.python.org/mailman/listinfo/python-list


Speed ain't bad

2004-12-30 Thread Bulba!

One of the posters inspired me to do profiling on my newbie script
(pasted below). After measurements I have found that the speed
of Python, at least in the area where my script works, is surprisingly
high.

This is the experiment: a script recreates the folder hierarchy
somewhere else and stores there the compressed versions of 
files from source hierarchy (the script is doing additional backups
of the disk of file server at the company where I work onto other
disks, with compression for sake of saving space). The data was: 

468 MB, 15057 files, 1568 folders
(machine: win2k, python v2.3.3)

The time that WinRAR v3.20 (with ZIP format and normal compression
set) needed to compress all that was 119 seconds.

The Python script time (running under profiler) was, drumroll... 

198 seconds.

Note that the Python script had to laboriously recreate the tree of
1568 folders and create over 15 thousand compressed files, so 
it had more work to do actually than WinRAR did. The size of
compressed data was basically the same, about 207 MB.

I find it very encouraging that in the real world area of application
a newbie script written in the very high-level  language can have the
performance that is not that far from the performance of shrinkwrap
pro archiver (WinRAR is excellent archiver, both when it comes to
compression as well as speed). I do realize that this is mainly
the result of all the underlying infrastructure of Python. Great
work, guys. Congrats.

The only thing I'm missing in this picture is knowledge if my script
could be further optimised (not that I actually need better
performance, I'm just curious what possible solutions could be). 

Any takers among the experienced guys?



Profiling results:

 p3.sort_stats('cumulative').print_stats(40)
Fri Dec 31 01:04:14 2004p3.tmp

 580543 function calls (568607 primitive calls) in 198.124 CPU
seconds

   Ordered by: cumulative time
   List reduced from 69 to 40 due to restriction 40

   ncalls  tottime  percall  cumtime  percall
filename:lineno(function)
10.0130.013  198.124  198.124 profile:0(z3())
10.0000.000  198.110  198.110 string:1(?)
10.0000.000  198.110  198.110 interactive
input:1(z3)
11.5131.513  198.110  198.110 zmtree3.py:26(zmtree)
15057   14.5040.001  186.9610.012 zmtree3.py:7(zf)
15057  147.5820.010  148.7780.010
C:\Python23\lib\zipfile.py:388(write)
15057   12.1560.001   12.1560.001
C:\Python23\lib\zipfile.py:182(__init__)
320027.9570.0008.5420.000
C:\PYTHON23\Lib\ntpath.py:266(isdir)
13826/18902.5500.0008.1430.004
C:\Python23\lib\os.py:206(walk)
301143.1640.0003.1640.000
C:\Python23\lib\zipfile.py:483(close)
602281.7530.0002.1490.000
C:\PYTHON23\Lib\ntpath.py:157(split)
451710.5380.0002.1160.000
C:\PYTHON23\Lib\ntpath.py:197(basename)
150571.2850.0001.9170.000
C:\PYTHON23\Lib\ntpath.py:467(abspath)
338900.6880.0001.4190.000
C:\PYTHON23\Lib\ntpath.py:58(join)
   1091750.7830.0000.7830.000
C:\PYTHON23\Lib\ntpath.py:115(splitdrive)
150570.1960.0000.7680.000
C:\PYTHON23\Lib\ntpath.py:204(dirname)
338900.4330.0000.7310.000
C:\PYTHON23\Lib\ntpath.py:50(isabs)
150570.5440.0000.6320.000
C:\PYTHON23\Lib\ntpath.py:438(normpath)
320020.4310.0000.5850.000
C:\PYTHON23\Lib\stat.py:45(S_ISDIR)
150570.5550.0000.5550.000
C:\Python23\lib\zipfile.py:149(FileHeader)
150570.4830.0000.4830.000
C:\Python23\lib\zipfile.py:116(__init__)
  1510.0020.0000.4350.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:171(write)
  1510.0020.0000.4320.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:489(write)
  1510.0130.0000.4300.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:461(HandleOutput)
   760.0870.0010.4050.005
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:430(QueueFlush)
150570.2390.0000.3400.000
C:\Python23\lib\zipfile.py:479(__del__)
150570.1570.0000.1570.000
C:\Python23\lib\zipfile.py:371(_writecheck)
320020.1540.0000.1540.000
C:\PYTHON23\Lib\stat.py:29(S_IFMT)
   760.0070.0000.1460.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:262(dowrite)
   760.0070.0000.1370.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\formatter.py:221(OnStyleNeeded)
   760.0110.0000.1180.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:197(Colorize)
   760.1100.0010.1120.001
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:69(SCIInsertText)

Re: Speed ain't bad

2004-12-30 Thread Jeremy Bowers
On Fri, 31 Dec 2004 01:41:13 +0100, Bulba! wrote:

 
 One of the posters inspired me to do profiling on my newbie script (pasted
 below). After measurements I have found that the speed of Python, at least
 in the area where my script works, is surprisingly high.
 
 This is the experiment: a script recreates the folder hierarchy somewhere
 else and stores there the compressed versions of files from source
 hierarchy (the script is doing additional backups of the disk of file
 server at the company where I work onto other disks, with compression for
 sake of saving space). The data was:

I did not study your script but odds are it is strongly disk bound.

This means that the disk access time is so large that it completely swamps
almost everything else. 

I would point out a couple of other ideas, though you may be aware of
them: Compressing all the files seperately, if they are small, may greatly
reduce the final compression since similarities between the files can not
be exploited. You may not care. Also, the zip format can be updated on a
file-by-file basis; it may do all by itself what you are trying to do,
with just a single command line. Just a thought.
-- 
http://mail.python.org/mailman/listinfo/python-list