A Friday 03 October 2008, [EMAIL PROTECTED] escrigué: > I'm writing some utility code that needs to search through some > presorted HDF5 Arrays to find values. The fastest way to search is > with the numpy.ndarray.searchsorted method. The problem is that some > of our data was written with Pytables from a list, so calling read on > the Array returns a python list, as per the flavor, so I need to > convert it to a numpy.array.
Don't underestimate the bisect module in Python as it performs typically better than numpy.searchsorted: >>> t1 = timeit.Timer("random.randint(0,N)", "N=1000*1000; import random; import numpy; r = numpy.arange(N); random.seed(N)") >>> t2 = timeit.Timer("r.searchsorted(random.randint(0,N))", "N=1000*1000; import random; import numpy; r = numpy.arange(N); random.seed(N)") >>> t3 = timeit.Timer("bisect.bisect(r, random.randint(0,N))", "N=1000*1000; import random; import bisect; r = range(N); random.seed(N)") >>> tref = numpy.array(t1.repeat(3, 100000)) >>> tref array([ 0.20582199, 0.19276404, 0.19211507]) >>> tnumpy = numpy.array(t2.repeat(3, 100000)) >>> tnumpy array([ 0.74087381, 0.73261905, 0.72765613]) >>> tbisect = numpy.array(t3.repeat(3, 100000)) >>> tbisect array([ 0.39502406, 0.39342785, 0.39117408]) >>> (tnumpy - tref) / (tbisect - tref) array([ 2.82793852, 2.69034569, 2.69036332]) So, in my machine, the bisect module outperforms numpy.searchsorted() by more than 2.5x. > If I can read the array as if it had a numpy flavor, that would save > me a step to determine if a conversion is needed, and the conversion > time. > > Is there a way to force PyTables to read a dataset as a different > flavor than it was saved as? There are a couple, yes. The first one is to change the flavor in disk: >>> import tables >>> f = tables.openFile('/tmp/test.h5', 'w') >>> f.createArray('/', 'array', [1,1]) /array (Array(2,)) '' atom := Int64Atom(shape=(), dflt=0) maindim := 0 flavor := 'python' byteorder := 'little' chunkshape := None >>> f.close() >>> f = tables.openFile('/tmp/test.h5', 'a') >>> f.root.array[:] [1, 1] # a list is retrieved >>> f.root.array.flavor = "numpy" # change the flavor >>> f.root.array[:] array([1, 1]) # now, a numpy array is retrieved If you don't want to change the original flavors, then you can use the restrict_flavors() function: >>> f = tables.openFile('/tmp/test.h5', 'w') >>> f.createArray('/', 'array', [1,1]) /array (Array(2,)) '' atom := Int64Atom(shape=(), dflt=0) maindim := 0 flavor := 'python' byteorder := 'little' chunkshape := None >>> f.close() >>> f = tables.openFile('/tmp/test.h5', 'r') >>> f.root.array[:] [1, 1] # retrieve as list >>> tables.restrict_flavors([]) # don't allow other flavors than numpy >>> f.root.array[:] /usr/lib64/python2.5/site-packages/tables/flavor.py:147: FlavorWarning: conversion from flavor ``numpy`` to flavor ``python`` is unsupported or unavailable in this system; returning an object of the ``numpy`` flavor instead % (fe.args[0], src_flavor), FlavorWarning ) array([1, 1]) # numpy object (a first warning is issued) >>> f.root.array[:] array([1, 1]) # numpy object (no more warnings) > Looking at the definition of the Array.read method, it already does > a conversion from the internal flavor (which is numpy in our case) to > the final flavor. I just need it to not convert from internal to the > array's flavor. My current workaround is to just use the first two > lines of the three-line method: > > array._read(*array._processRangeRead(None,None,None)) > > But I'd rather use a method in the official API. Hope that helps, -- Francesc Alted Freelance developer Tel +34-964-282-249 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users