Hi Francesc Thanks for you help to resolve this confusion.
Does your answer imply that even if I would have numpy installed and somehow enabled my problem with string col indexes will not go away? Regards Andrei PS: have you seen this site: http://wilmott.com/categories.cfm?catid=10. People here need something to deal with huge datasets. Some of them familiar with python. If you would figure out how to cure speed of string indexes buildup (it is widely used because almost all traded securities are identified by string Ids, not ints) then you are going to be rich very soon :) PPS: this is instead of success story. Hopefully I'll get one sooner or later to share. -----Original Message----- From: Francesc Altet [mailto:[EMAIL PROTECTED] Sent: Friday, September 08, 2006 3:11 AM To: Smirnov, Andrei Cc: '[email protected]' Subject: Re: [Pytables-users] String columns indexing Hi Andrei, El dj 07 de 09 del 2006 a les 16:42 -0400, en/na Smirnov, Andrei va escriure: > Hello everybody! > > I am using Python 2.4, pytables 1.3.2, numarray-1.5.1, hdf5-1.6.5 on > Linux 2.4. > > Am I right about speed of index creation for string columns: it is > much-much slower in compare with integer column index creation? Yes, this is due to slowness in the sorting method for numarray strings: In [46]:a=numpy.arange(10000, dtype="byte") In [47]:a.tofile('test.bin') In [55]:Timer("b=a.sort()", "import numpy;a=numpy.fromfile('test.bin', dtype='S10')").repeat(3,100) Out[55]:[0.15041804313659668, 0.10844111442565918, 0.10880494117736816] In [56]:Timer("b=a.sort()", "import numarray.strings; a=numarray.strings.fromfile('test.bin', itemsize=10)").repeat(3,100) Out[56]:[2.4770519733428955, 2.4263198375701904, 2.4242000579833984] i.e. numarray sorting for strings is 20x slower than numpy strings. Of course, when numpy will be at the core of pytables the indexing times will hopefully be much better. > > I could replace string column with integer one which contains index > for some string table. Is it the best what I can do for the moment? Well, if what you want is to search strings as keys in a dictionary, you can follow a similar strategy by creating a hash (for example with python builtin hash()) of the string and feed this value to a Int32 (or Int64, if you are on a 64-bit platform) column. For integers (and, in general, for anything that is not a string), the sorting speed in numarray and numpy are similar: In [57]:Timer("b=a.sort()", "import numarray; a=numarray.fromfile('test.bin', type='Int32')").repeat(3,100) Out[57]:[0.030822992324829102, 0.03096318244934082, 0.031370878219604492] In [58]:Timer("b=a.sort()", "import numpy;a=numpy.fromfile('test.bin', dtype='int32')").repeat(3,100) Out[58]:[0.094920158386230469, 0.038717985153198242, 0.038733959197998047] HTH, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ============================================================================== ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Pytables-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pytables-users
