Dino Viehland wrote:
Ok, I looked into a bunch of these and here's what I've discovered so far and
other random comments...
Exceptions (100000): 40% slower
IP1: 4703
IP2: 6125
Py: 266
I haven't looked at this one yet. I do know that we have a number of bug fixes
for our exception handling which will slow it down though. I don't consider
this to be a high priority though. If we wanted to focus on exception perf I
think we'd want to do something radical rather than small tweaks to the
existing code. If there's certain scenarios where exception perf is critical
though it'd be interesting to hear about those and if we can do anything to
improve them.
I can look at this.
Engine execution: 8000% slower!!
IP1: 1600
IP2: 115002
This is just a silly bug. We're doing a tree re-write of the AST and we do
that every time through. Caching that re-write gets us back to 1.x
performance. I have a fix for this.
Great! (1.x performance was very impressive.)
Create function: 25% slower
IP1: 2828
IP2: 3640
Py: 2766
Part of this is from a bug fix but the fix could be more efficient. In 1.x we
don't look up __module__ from the global scope. In 2.x we do this lookup but
it searches all scopes - which isn't even correct. But we can do a direct
lookup which is a little faster - so I have a partial fix for this. This will
still be a little slower than 1.x though.
Ok.
Define oldstyle (1 000 000): 33% slower
IP1: 1781
IP2: 2671
Py: 2108
Is this critical? I'd rather just live w/ the slowness rather than fixing
something that will be gone in 3.x :)
Not a problem for us - I merely noted it. In 1.x we needed to switch a
few classes to old style for performance reasons (but we don't
repeatedly redefine them - it was instantiation time). In 2.x we will
need to switch back (which is great).
Lists (10 000): 50% slower
IP1: 10422
IP2: 16109
Py: 6094
The primary issue here is that adding 2 lists ends up creating a new list whose
storage is the exact size needed for storing the two lists. When you append to
it after adding it we need to allocate a brand new array - and you're not
dealing with small arrays here. We can add a little extra space depending on
the size of the array to minimize the chance of needing a re-size. That gets
us to about 10% slower than CPython. I'm also going to add a strongly typed
extend overload which should make those calls a little faster.
Python lists will typically grow to always have a lot of space. Creating
a list with no extra space seems like a problem. My benchmark for this
was unrealistic though (we add lists and extend them a lot - but
typically they're nothing like that size).
Sets2 (100 000): 500% slower
IP1: 4984
IP2: 30547
Py: 1203
This one I actually cannot repro yet (I've tried it on 3 machines but they've
all been Vista). I'm going to try next on a Srv 2k3 machine and see if I can
track it down. But more information would be useful.
Hmmm... I wonder if it is an oddity with my machine. Unfortunately I am
not at work today and can't repeat it. I've just run it on Vista (.NET
2.0.50727.3053) running under VMWare Fusion (but on a kick-arse machine).
IP1.1.2: 3515
IP2.0B4: 2516
I need to rerun the whole Resolver port on someone else's machine.
Comparing (== and !=):
IP1: 278597
IP2: 117662
This one is actually pretty interesting (even though we're faster in 2.x) - there's an issue with the test here. You've defined "__neq__" instead of "__ne__".
Ha! Oops. :-)
That causes the != comparison to ultimately compare based upon object identity - which is extremely slow. There might be some things we can do to make the object identity comparison faster (For example recognizing that we're doing equality and just need a eq or ne answer rather than a 1, -1, 0 comparison value). But I'm going to assume comparing on object identity isn't very important right now - let me know if I'm wrong.
We do use identity comparison a lot - but I'm not sure if it is in
performance critical parts of our code. I can review this.
But switching this to __ne__ causes us to be a little faster than CPython.
They have a great advantage on object identity comparisons - they can just use
the objects address.
Sure.
I was also curious what happens to this case if we use __slots__. That
identified yet another massive performance regression which I have a fix for -
creating instances that have __slots__ defined is horribly slow. With that bug
fixed and using slots and __ne__ instead of __neq__ we can actually run this
over 2x faster than CPython (on Vista x86 .NET 3.5SP1 on a 2.4ghz Core 2 w/
4gb of RAM).
Cool.
Michael
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Foord
Sent: Thursday, August 14, 2008 9:42 AM
To: Discussion of IronPython
Subject: Re: [IronPython] Performance of IronPython 2 Beta 4 and IronPython 1
Just for fun I also compared with CPython. The results are interesting, I'll
turn it into a blog post of course...
Results in milliseconds with a granularity of about 15ms and so an accuracy of
+/- ~60ms.
All testing with 10 000 000 operations unless otherwise stated.
The version of Python I compared against was Python 2.4.
Empty loop (overhead):
IP1: 422
IP2: 438
Py: 3578
Create instance newstyle:
IP1: 20360
IP2: 1109
Py: 4063
Create instance oldstyle:
IP1: 3766
IP2: 3359
Py: 4797
Function call:
IP1: 937
IP2: 906
Py: 3313
Create function: 25% slower
IP1: 2828
IP2: 3640
Py: 2766
Define newstyle (1 000 000):
IP1: 42047
IP2: 20484
Py: 23921
Define oldstyle (1 000 000): 33% slower
IP1: 1781
IP2: 2671
Py: 2108
Comparing (== and !=):
IP1: 278597
IP2: 117662
Py: 62423
Sets:
IP1: 37095
IP2: 30860
Py: 8047
Lists (10 000): 50% slower
IP1: 10422
IP2: 16109
Py: 6094
Recursion (10 000):
IP1: 1125
IP2: 1000
Py: 3609
Sets2 (100 000): 500% slower
IP1: 4984
IP2: 30547
Py: 1203
func_with_args:
IP1: 6312
IP2: 5906
Py: 11250
method_with_args:
IP1: 20594
IP2: 11813
Py: 14875
method_with_kwargs:
IP1: 27953
IP2: 11187
Py: 20032
import: 15% slower
IP1: 28469
IP2: 32000
Py: 25782
global: 20% slower
IP1: 1047
IP2: 1203
Py: 4141
Exceptions (100000): 40% slower
IP1: 4703
IP2: 6125
Py: 266
Engine execution: 8000% slower!!
IP1: 1600
IP2: 115002
Michael Foord wrote:
Hello all,
I've ported Resolver One to run on IronPython 2 Beta 4 to check for
any potential problems (we will only do a *proper* port once IP 2 is
out of beta).
The basic porting was straightforward and several bugs have been fixed
since IP 2 B3 - many thanks to the IronPython team.
The good news is that Resolver One is only 30-50% slower than Resolver
One on IronPython 1! (It was 300 - 400% slower on top of IP 2 B3.)
Resolver One is fairly heavily optimised around the performance
hotspots of IronPython 1, so we expect to have to do a fair bit of
profiling and refactoring to readjust to the performance profile of IP 2.
Having said that, there are a few oddities (and the areas that slow
down vary tremendously depending on which spreadsheet we use to
benchmark it - making it fairly difficult to track down the hotspots).
We have one particular phase of spreadsheet calculation that takes
0.4seconds on IP1 and around 6 seconds on IP2, so I have been doing
some micro-benchmarking to try and identify the hotspot. I've
certainly found part of the problem.
For those that are interested I've attached the very basic
microbenchmarks I've been using. The nice thing is that in *general*
IP2 does outperform IP1.
The results that stand out in the other direction are:
Using sets with custom classes (that define '__eq__', '__ne__' and
'__hash__') seems to be 6 times slower in IronPython 2.
Adding lists together is about 50% slower.
Defining functions seems to be 25% slower and defining old style
classes about 33% slower. (Creating instances of new style classes is
massively faster though - thanks!)
The code I used to test sets (sets2.py) is as follows:
from System import DateTime
class Thing(object):
def __init__(self, val):
self.val = val
def __eq__(self, other):
return self.val == other.val
def __neq__(self):
return not self.__eq__(other)
def __hash__(self):
return hash(self.val)
def test(s):
a = set()
for i in xrange(100000):
a.add(Thing(i))
a.add(Thing(i+1))
Thing(i) in a
Thing(i+2) in a
return (DateTime.Now -s).TotalMilliseconds
s = DateTime.Now
print test(s)
Interestingly the time taken is exactly the same if I remove the
definition of '__hash__'.
The full set of results below:
Results in milliseconds with a granularity of about 15ms and so an
accuracy of +/- ~60ms.
All testing with 10 000 000 operations unless otherwise stated.
Empty loop (overhead):
IP1: 421.9
IP2: 438
Create instance newstyle:
IP1: 20360
IP2: 1109
Create instance oldstyle:
IP1: 3766
IP2: 3359
Function call:
IP1: 937
IP2: 906
Create function: 25% slower
IP1: 2828
IP2: 3640
Define newstyle (1 000 000):
IP1: 42047
IP2: 20484
Define oldstyle (1 000 000): 33% slower
IP1: 1781
IP2: 2671
Comparing (== and !=):
IP1: 278597
IP2: 117662
Sets (with numbers):
IP1: 37095
IP2: 30860
Lists (10 000): 50% slower
IP1: 10422
IP2: 16109
Recursion (10 000):
IP1: 1125
IP2: 1000
Sets2 (100 000): 600% slower
IP1: 4984
IP2: 30547
I'll be doing more as the 600% slow down for sets and the 50% slow
down for lists accounts for some of the dependency analysis problem
but not all of it.
Many Thanks
Michael Foord
--
http://www.resolversystems.com
http://www.ironpythoninaction.com
----------------------------------------------------------------------
--
_______________________________________________
Users mailing list
[email protected]
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/
http://www.trypython.org/
http://www.ironpython.info/
http://www.resolverhacks.net/
http://www.theotherdelia.co.uk/
_______________________________________________
Users mailing list
[email protected]
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/
http://www.trypython.org/
http://www.ironpython.info/
http://www.theotherdelia.co.uk/
http://www.resolverhacks.net/
_______________________________________________
Users mailing list
[email protected]
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com