James Reynolds, 26.05.2011 21:34:
On Thu, May 26, 2011 at 3:07 PM, Stefan Behnel wrote:
Stefan Behnel, 26.05.2011 18:10:
  James Reynolds, 26.05.2011 17:22:

As an intellectual exercise, I wanted to try my hand at writing some
extensions in C.

This is fine for en exercise, and I hope you had fun doing this.

However, for real code, I suggest you use Cython instead. Your module
would have been substantially simpler and likely also faster.

http://cython.org

Oh, and one more thing: it makes it easier to write safe, portable and
versatile code. As others have pointed out, your code has unnecessary bugs.
It also doesn't compile in Python 3 and lacks the ability to calculate the
averages of a set or deque, for example. Instead, it only handles tuples and
lists. That reduces the usefulness of your implementation.

Thank you for point out the above. Could you kindly please point out some of
those unnecessary bugs?

Alan gave a good response here. For example, PyFloat_AsDouble() can fail, you need to handle that.

Your code also does not correctly sum up floating point numbers. See math.fsum() for that.

Here's a possible Cython version of the variance() function:

  from math import fsum

  def variance(seq):
      "Calculate the variance of all FP numbers in a sequence."
      cdef double value, dsum
      cdef Py_ssize_t count = len(seq)

      dsum = fsum(seq)
      average = dsum / count

      return fsum([(average - value) ** 2 for value in seq]) / count

You can avoid the list comprehension (read: creation) in the second call by using a generator expression (remove the angular brackets). However, this is likely to run slower (and requires Cython 0.15 ;), so we are trading memory for speed here.

If you want this implementation to run faster, you need to unfold and inline fsum's algorithm, which you can look up in the CPython sources.

For comparison, here is a less accurate version that does not use fsum() but your own algorithm:

  def variance(seq):
      cdef double value, dsum, sqsum
      cdef Py_ssize_t count = len(seq)

      dsum = 0.0
      for value in seq:
          dsum += value
      average = dsum / count

      sqsum = 0.0
      for value in seq:
          sqsum += (average - value) ** 2
      return sqsum / count

Note that both implementations cannot work with iterators as input as they iterate twice. If you want to support that, you can add

    seq = list(seq)

at the start, which will let us trade memory for this feature. An additional type test condition for seq not being a list or tuple will give you more or less what PySequence_Fast() does.


I'm not sure that it matters that it won't work in sets and deque's, so long
as the documentation is clear, no?

It will matter to the users of your code at some point.


(Which I'm still not sure how to do, just yet)

What do you mean? Doc strings? You can just add them to the module function struct:

http://docs.python.org/py3k/extending/extending.html#the-module-s-method-table-and-initialization-function


But, I did test it for sets and deque and it works just fine.

setx = set([])
print type(setx)
for i in r:
     setx.add(random.choice(r))
print stats.mean(setx)
dequex = deque([])
print type(dequex)
for i in r:
     dequex.append(random.choice(r))
print stats.mean(dequex)

Right. I keep forgetting that PySequence_Fast() actually has a fallback that copies the iterable into a tuple. So you are also trading memory here to make it work for all iterable types. Personally, I'd never use the PySequence_Fast*() API because it isn't really all that fast. It's just faster than normal Python iteration for tuples and lists and also avoids copying in that case. All other cases are worse than generic iteration, and it's not faster than Cython's looping code.


I'll see what I can do about making it work with P3k, I think the only thing
that would need to be changed would be "PyMODINIT_FUNC initstats(void)" I
believe. Please correct me if I'm wrong though.

You need to create a module struct and use a different name for the init function. Another thing that Cython code doesn't have to care about.

Stefan

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to