A Tuesday 20 November 2007, Istvan Albert escrigué:
> On Nov 19, 2:33 pm, Francesc Altet <[EMAIL PROTECTED]> wrote:
> > Just for the record.  I was unable to stop thinking about this, and
> > after some investigation, I guess that my rememberings were
> > gathered from some benchmarks with code in Pyrex (i.e. pretty near
> > to C speed).
>
> Pretty interesting and quite unexpected.

Yeah, and also bogus :(

It turned out that in my previous benchmark, I've used plain numpy int 
arrays to do measurements, and when you extract an element out of a 
numpy array what you obtain is a *numpy scalar*, which is a different 
beast than a python (long) integer.  Unfortunately, pyrex does swallow 
it and happily converted to a long long C, but produces a wrong result.  
This looks like a pyrex fault, and I should report it to the pyrex 
list.

At any rate, with the corrected benchmarks using plain python lists 
instead (attached), the numbers are:

Calibrating loop...
Calibrating time: 1.458
Creating the dictionary...
Time for dict creation: 15.305
Timing queries with a dict...
Time for dict query: 11.632
Creating BinarySearch object...
Time for BinarySearch creation: 8.648
Timing queries with a BinarySearch...
Time for BinarySearch queries: 19.393

That is, dictionaries do lookups 1.7x faster than a binary lookup (1.42 
us/lookup vs 2.37 us/lookup), which is, I suppose, what is expected.  
Well, this seems the end of my 'peculiar' theory, but at least served 
to uncover what it seems a bug in pyrex :P

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"
# BinarySearch class to demonstrate that binary search is competitive
# against dictionaries based on hashes

import numpy

# In order to access the data area in NumPy objects
cdef extern from "numpy/arrayobject.h":

  ctypedef extern class numpy.ndarray [object PyArrayObject]:
    cdef char *data

  void import_array()

import_array()


cdef long bisect_left(long long *a, long long x, long lo, long hi):
    cdef long mid

    while lo < hi:
        mid = (lo+hi)/2
        if a[mid] < x: lo = mid+1
        else: hi = mid
    return lo


cdef class BinarySearch:
    cdef long len
    cdef object values
    cdef long *revidd
    cdef long long *sidd
    cdef ndarray sid, revid

    def __new__(self, object keys, object values):
        self.sid = numpy.array(keys)
        self.revid = self.sid.argsort()
        self.revidd = <long *>self.revid.data
        self.sid.sort()
        self.sidd = <long long *>self.sid.data
        self.values = values
        self.len = len(values) - 1

    def __getitem__(self, x):
        cdef long pos, idx

        pos = bisect_left(self.sidd, x, 0, self.len)
        idx = self.revidd[pos]
        return self.values[idx]

Attachment: gq-binary-search.py
Description: application/python

Attachment: setup.py
Description: application/python

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to