Re: [Pytables-users] What is the result of calling craeteIndex() on multiple columns?

Francesc Alted Wed, 27 Jun 2012 01:44:18 -0700

On 6/26/12 11:19 PM, Aquil H. Abdullah wrote:
> Hello All,
>
> In my newbist state, I called createIndex on two columns in one of my 
> tables:
>
> import tables
> table_desc = {'timestamp':tables.Time32Col(), 
> 'symbol':tables.StringCol(8), 'observation':tables.Float32Col()}
> h5f = tables.openFile('test.h5',mode='w')
> group = h5f.createGroup('/','data')
> table = h5f.createTable(group, 'test',table_desc,'Test Table')
> table.cols.timestamp.createIndex()
> table.cols.symbol.createIndex()
> …
>
> Now from what I've been able to find on the internet an index is only 
> associated with one column:
>
> class tables.Index
> Represents the index of a column in a table.
>
> This class is used to keep the indexing information for columns in a 
> Table dataset (see The Table class). It is actually the descendant of the
> Group class (see The Group class), with some added functionality. An 
> Index is always associated with one and only one column in a table.
>
> - PyTables 2.3.1 User's Guide - Library Reference/The Index Class 
> http://pytables.github.com/usersguide/libref.html#indexclassdescr
> - Efficient way to verify that records are unique in Python/PyTables 
> http://stackoverflow.com/questions/1315129/efficient-way-to-verify-that-records-are-unique-in-python-pytables
> - Hints For SQL Users (Creating an index) 
> http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex
>
> So how does PyTables interpret a table with multiple column indices?


If a table has multiple indices, PyTables will use its internal query 
optimizer to try to use these in your queries. It is not always possible 
for PyTables to use all indexes though. Please see:

http://pytables.github.com/usersguide/optimization.html#indexed-searches

for a series of examples where different indexes can be used.

> The best solution that I've found is creating a hash from the two 
> fields that I am interested in indexing and then indexing that table 
> on that hash.

In case several indexes cannot be use in your case, that could be an 
alternate solution for what you are trying to do, yes.

>
> The other solution would be to shard my data by symbol and then index 
> each symbol table by timestamp.

The range of possibilities is really large, yes, but I'd try to avoid 
sharding because it is normally harder to setup and manage, but you are 
indeed free to try whatever approaches you feel they are best for you.

HTH,

-- 
Francesc Alted


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] What is the result of calling craeteIndex() on multiple columns?

Reply via email to