Hi Andrei,

El dj 07 de 09 del 2006 a les 16:42 -0400, en/na Smirnov, Andrei va
escriure:
> Hello everybody!
> 
> I am using Python 2.4, pytables 1.3.2, numarray-1.5.1, hdf5-1.6.5 on
> Linux 2.4.
> 
> Am I right about speed of index creation for string columns: it is
> much-much slower in compare with integer column index creation?

Yes, this is due to slowness in the sorting method for numarray strings:

In [46]:a=numpy.arange(10000, dtype="byte")
In [47]:a.tofile('test.bin')
In [55]:Timer("b=a.sort()", "import numpy;a=numpy.fromfile('test.bin',
dtype='S10')").repeat(3,100)
Out[55]:[0.15041804313659668, 0.10844111442565918, 0.10880494117736816]
In [56]:Timer("b=a.sort()", "import numarray.strings;
a=numarray.strings.fromfile('test.bin', itemsize=10)").repeat(3,100)
Out[56]:[2.4770519733428955, 2.4263198375701904, 2.4242000579833984]

i.e. numarray sorting for strings is 20x slower than numpy strings. Of
course, when numpy will be at the core of pytables the indexing times
will hopefully be much better.

> 
> I could replace string column with integer one which contains index
> for some string table. Is it the best what I can do for the moment?

Well, if what you want is to search strings as keys in a dictionary, you
can follow a similar strategy by creating a hash (for example with
python builtin hash()) of the string and feed this value to a Int32 (or
Int64, if you are on a 64-bit platform) column. For integers (and, in
general, for anything that is not a string), the sorting speed in
numarray and numpy are similar:

In [57]:Timer("b=a.sort()", "import numarray;
a=numarray.fromfile('test.bin', type='Int32')").repeat(3,100)
Out[57]:[0.030822992324829102, 0.03096318244934082,
0.031370878219604492]
In [58]:Timer("b=a.sort()", "import numpy;a=numpy.fromfile('test.bin',
dtype='int32')").repeat(3,100)
Out[58]:[0.094920158386230469, 0.038717985153198242,
0.038733959197998047]


HTH,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to