A Friday 03 October 2008, [EMAIL PROTECTED] escrigué:
>       I'm writing some utility code that needs to search through some
> presorted HDF5 Arrays to find values.  The fastest way to search is
> with the numpy.ndarray.searchsorted method.  The problem is that some
> of our data was written with Pytables from a list, so calling read on
> the Array returns a python list, as per the flavor, so I need to
> convert it to a numpy.array.

Don't underestimate the bisect module in Python as it performs typically 
better than numpy.searchsorted:

>>> t1 = timeit.Timer("random.randint(0,N)", "N=1000*1000; import 
random; import numpy; r = numpy.arange(N); random.seed(N)")
>>> t2 = 
timeit.Timer("r.searchsorted(random.randint(0,N))", "N=1000*1000; 
import random; import numpy; r = numpy.arange(N); random.seed(N)")
>>> t3 = timeit.Timer("bisect.bisect(r, 
random.randint(0,N))", "N=1000*1000; import random; import bisect; r = 
range(N); random.seed(N)")
>>> tref = numpy.array(t1.repeat(3, 100000))
>>> tref
array([ 0.20582199,  0.19276404,  0.19211507])
>>> tnumpy = numpy.array(t2.repeat(3, 100000))
>>> tnumpy
array([ 0.74087381,  0.73261905,  0.72765613])
>>> tbisect = numpy.array(t3.repeat(3, 100000))
>>> tbisect
array([ 0.39502406,  0.39342785,  0.39117408])
>>> (tnumpy - tref) / (tbisect - tref)
array([ 2.82793852,  2.69034569,  2.69036332])

So, in my machine, the bisect module outperforms numpy.searchsorted() by 
more than 2.5x.

>       If I can read the array as if it had a numpy flavor, that would save
> me a step to determine if a conversion is needed, and the conversion
> time.
>
>       Is there a way to force PyTables to read a dataset as a different
> flavor than it was saved as?

There are a couple, yes.  The first one is to change the flavor in disk:

>>> import tables
>>> f = tables.openFile('/tmp/test.h5', 'w')
>>> f.createArray('/', 'array', [1,1])
/array (Array(2,)) ''
  atom := Int64Atom(shape=(), dflt=0)
  maindim := 0
  flavor := 'python'
  byteorder := 'little'
  chunkshape := None
>>> f.close()
>>> f = tables.openFile('/tmp/test.h5', 'a')
>>> f.root.array[:]
[1, 1]   # a list is retrieved
>>> f.root.array.flavor = "numpy"   # change the flavor
>>> f.root.array[:]
array([1, 1])   # now, a numpy array is retrieved


If you don't want to change the original flavors, then you can use the 
restrict_flavors() function:

>>> f = tables.openFile('/tmp/test.h5', 'w')
>>> f.createArray('/', 'array', [1,1])
/array (Array(2,)) ''
  atom := Int64Atom(shape=(), dflt=0)
  maindim := 0
  flavor := 'python'
  byteorder := 'little'
  chunkshape := None
>>> f.close()
>>> f = tables.openFile('/tmp/test.h5', 'r')
>>> f.root.array[:]
[1, 1]   # retrieve as list
>>> tables.restrict_flavors([])   # don't allow other flavors than numpy
>>> f.root.array[:]
/usr/lib64/python2.5/site-packages/tables/flavor.py:147: FlavorWarning: 
conversion from flavor ``numpy`` to flavor ``python`` is unsupported or 
unavailable in this system; returning an object of the ``numpy`` flavor 
instead
  % (fe.args[0], src_flavor), FlavorWarning )
array([1, 1])    # numpy object (a first warning is issued)
>>> f.root.array[:]
array([1, 1])   # numpy object (no more warnings)

>       Looking at the definition of the Array.read method, it already does
> a conversion from the internal flavor (which is numpy in our case) to
> the final flavor.  I just need it to not convert from internal to the
> array's flavor.  My current workaround is to just use the first two
> lines of the three-line method:
>
> array._read(*array._processRangeRead(None,None,None))
>
> But I'd rather use a method in the official API.

Hope that helps,

-- 
Francesc Alted
Freelance developer
Tel +34-964-282-249

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to