On 6/26/12 11:19 PM, Aquil H. Abdullah wrote: > Hello All, > > In my newbist state, I called createIndex on two columns in one of my > tables: > > import tables > table_desc = {'timestamp':tables.Time32Col(), > 'symbol':tables.StringCol(8), 'observation':tables.Float32Col()} > h5f = tables.openFile('test.h5',mode='w') > group = h5f.createGroup('/','data') > table = h5f.createTable(group, 'test',table_desc,'Test Table') > table.cols.timestamp.createIndex() > table.cols.symbol.createIndex() > … > > Now from what I've been able to find on the internet an index is only > associated with one column: > > class tables.Index > Represents the index of a column in a table. > > This class is used to keep the indexing information for columns in a > Table dataset (see The Table class). It is actually the descendant of the > Group class (see The Group class), with some added functionality. An > Index is always associated with one and only one column in a table. > > - PyTables 2.3.1 User's Guide - Library Reference/The Index Class > http://pytables.github.com/usersguide/libref.html#indexclassdescr > - Efficient way to verify that records are unique in Python/PyTables > http://stackoverflow.com/questions/1315129/efficient-way-to-verify-that-records-are-unique-in-python-pytables > - Hints For SQL Users (Creating an index) > http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex > > So how does PyTables interpret a table with multiple column indices?
If a table has multiple indices, PyTables will use its internal query optimizer to try to use these in your queries. It is not always possible for PyTables to use all indexes though. Please see: http://pytables.github.com/usersguide/optimization.html#indexed-searches for a series of examples where different indexes can be used. > The best solution that I've found is creating a hash from the two > fields that I am interested in indexing and then indexing that table > on that hash. In case several indexes cannot be use in your case, that could be an alternate solution for what you are trying to do, yes. > > The other solution would be to shard my data by symbol and then index > each symbol table by timestamp. The range of possibilities is really large, yes, but I'd try to avoid sharding because it is normally harder to setup and manage, but you are indeed free to try whatever approaches you feel they are best for you. HTH, -- Francesc Alted ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users