Bugs item #1560161, was opened at 2006-09-17 14:09
Message generated for change (Comment added) made by einsteinmg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1560161&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
>Resolution: None
Priority: 5
Submitted By: Michael Gebetsroither (einsteinmg)
Assigned to: Nobody/Anonymous (nobody)
Summary: Better/faster implementation of os.path.split

Initial Comment:
hi,

os.path.split is quite bad regarding performance on 
long pathnames:

def split(p):
    i = p.rfind('/') + 1
    head, tail = p[:i], p[i:]
    if head and head != '/'*len(head):
        head = head.rstrip('/')
    return head, tail

especially this: '/'*len(head)
this constructs an unnecessary string sometimes 
thousands of chars long.

better would be:
if head and len(head) != head.count('/')

BUT:
what is this 'if head and head != '/'*len(head):' for?
this if is imho useless, because
if head exists and is not all '/' => rstrip '/'

imho better would be:
rstrip '/' from head and if head is empty add a '/'
would be the same effect, because a singel '/' is just 
the same as a path as '/'*len(head).

def split(p):
    i = p.rfind('/') + 1
    head, tail = p[:i], p[i:]
    head = head.rstrip('/')
    if not head:
        head = '/'
    return head, tail

such a implementation would be ways faster for long 
pathnames.

greets,
michael

----------------------------------------------------------------------

>Comment By: Michael Gebetsroither (einsteinmg)
Date: 2006-09-18 11:25

Message:
Logged In: YES 
user_id=1600082

patch passes all unittests for posixpath.

basename( 310 ) means basename called with path of length 
310

sum = 0.0453672409058 min = 4.19616699219e-05 
posixpath.basename( 310 )
sum = 0.15571641922 min = 0.000146865844727 
posixpath_orig.basename( 310 )

sum = 0.0432558059692 min = 4.10079956055e-05 
posixpath.basename( 106 )
sum = 0.128361940384 min = 0.000113964080811 
posixpath_orig.basename( 106 )

sum = 0.0422701835632 min = 4.10079956055e-05 
posixpath.basename( 21 )
sum = 0.118340730667 min = 0.000111818313599 
posixpath_orig.basename( 21 )

so this optimized basename is about 3 times faster as the 
old one and gets even faster for longer paths.

sum = 0.124966621399 min = 0.000120878219604 
posixpath.dirname( 310 )
sum = 0.156893730164 min = 0.000144958496094 
posixpath_orig.dirname( 310 )

sum = 0.0986065864563 min = 9.10758972168e-05 
posixpath.dirname( 106 )
sum = 0.117443084717 min = 0.000113964080811 
posixpath_orig.dirname( 106 )

sum = 0.0905299186707 min = 8.89301300049e-05 
posixpath.dirname( 21 )
sum = 0.118889808655 min = 0.000111103057861 
posixpath_orig.dirname( 21 )

optimized dirname is also faster but not that much.
but it saves an allocation which could save a few cycles 
later.

----------------------------------------------------------------------

Comment By: Michael Gebetsroither (einsteinmg)
Date: 2006-09-18 11:08

Message:
Logged In: YES 
user_id=1600082

sorry, haven't benchmarked my solution

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1560161&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to