On 3/28/2011 12:37 PM Albert-Jan Roskam said...
Hi Stefan,

Thanks for your advice. I seriously thought ctypes was the module to
use. That was before I found out the evaluating all 10**9 values of my
test data set is glacially slow (several hours). You're right, the dll
implies the program is running on windows. I've also been trying to make
it work under Linux but I wanted to get the basic algorithm right first.
Also, it was quite a PIA to get all the dependencies of the (old) .so files.

Your speed tip reminded me of:
http://wiki.python.org/moin/PythonSpeed/Performan ceTips#Avoiding_dots
<http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Avoiding_dots>...
Does this mean that "from ctypes import *"

That's not necessary. You can also make them local by performing the name lookup only once:

  import ctypes
  myXxx = ctypes.Xxx

Then use myXxx going forward.


gives slightly faster code
than "import ctypes"? If so: wow! I've always avoided the first notation
like the plague.

What do you mean with '... using a constant pointer for numValue' ?

It looks like your function can reuse a once-set value where you set numValue outside your function and refer to it.

Emile



Is
this the byref/pointer object distinction? I replaced a the pointer
object with a byref object, which reduced processing time by about 10 %.

Cython might be interesting as a hobby project, but I'm affraid I'll
never get the ICT droids in my office to install that.

Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine,
public order, irrigation, roads, a fresh water system, and public
health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


*From:* Stefan Behnel <[email protected]>
*To:* [email protected]
*Sent:* Mon, March 28, 2011 7:43:16 AM
*Subject:* Re: [Tutor] how to optimize this code?

Albert-Jan Roskam, 27.03.2011 21:57:
 > I made a program that reads spss data files. I ran cProfile to see if
I can
 > optimize things (see #1 below).

First thing to note here: sort the output by "time", which refers to the
"tottime" column. That will make it more obvious where most time is
really spent.


 > It seems that the function getValueNumeric is a pain spot (see #2
 > below). This function calls a C function in a dll for each numerical
 > cell value. On the basis of this limited amount of info, what could I do
 > to further optimize the code? I heard about psyco, but I didn't think
 > such tricks would be necessary as the function spssGetValueNumeric is is
 > implemented in C already (which should be fast).

The problem is that you are using ctypes to call it. It's useful for
simple things, but it's not usable for performance critical things, such
as calling a C function ten million times in your example. Since you're
saying "dll", is this under Windows? It's a bit more tricky to set up
Cython on that platform than on pretty much all others, since you
additionally need to install a C compiler, but if you want to go that
route, it will reward you with a much faster way to call your C code,
and will allow you to also speed up the code that does the calls.

That being said, see below.


 > ## most time consuming function
 >
 > def getValueNumeric(fh, spssio, varHandle):
 > numValue = ctypes.c_double()
 > numValuePtr = ctypes.byref(numValue)
 > retcode = spssio.spssGetValueNumeric(fh,
 > &n bsp; ctypes.c_double(varHandle),
 > numValuePtr)

You may still be able to make this code a tad faster, by avoiding the
function name lookups on both the ctypes module and "spssio", and by
using a constant pointer for numValue (is you're not using threads).
That may not make enough of a difference, but it should at least be a
little faster.

Stefan

_______________________________________________
Tutor maillist - [email protected] <mailto:[email protected]>
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor



_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to