On Nov 17, 2011, at 10:35 PM, Alan Marchiori wrote:

> Hello,

Hi Alan,

> I am attempting to use PyTables (v2.3.1) to store timestamped data and
> things were going well until I added a column index.  While the column
> is indexed no data is returned from a table.where call!
> 
> This behavior is demonstrated with the following test code:
> ---begin test.py---
> import tables
> import random
> 
> class Descr(tables.IsDescription):
>    when = tables.Time64Col(pos = 1)
>    value = tables.Float32Col(pos = 2)
> 
> h5f = tables.openFile('/tmp/tmp.h5', 'w')
> tbl = h5f.createTable('/', 'test', Descr)
> 
> tbl.cols.when.createIndex(_verbose = True)
> 
> t = 1321031471.0  # 11/11/11 11:11:11
> tbl.append([(t + i, random.random()) for i in range(1000)])
> tbl.flush()
> 
> def query(s):
>    print 'is_index =', tbl.cols.when.is_indexed
>    print [(row['when'], row['value']) for row in tbl.where(s)]
>    print tbl.readWhere(wherestr)
> 
> wherestr = '(when >= %d) & (when < %d)'%(t, t+5)
> query(wherestr)
> tbl.cols.when.removeIndex()
> query(wherestr)
> 
> h5f.close()
> ---end test.py---
> 
> This creates the table for storing time/value pairs, inserts some
> synthetic data, and then checks to see if there is data in the table.
> When the table is created there is an index added to the 'where'
> column.  The first query returns no data (which is incorrect).  Then
> the column index is removed (via table.removeIndex) and the query is
> repeated.  This time 5 results are returned as expected.  The data is
> clearly there however the index is somehow breaking the where logic.
> Here is the output I get:
> 
> ---begin output---
> is_index = True
> []
> []
> is_index = False
> [(1321031471.0, 0.6449417471885681), (1321031472.0,
> 0.7889317274093628), (1321031473.0, 0.609708845615387), (1321031474.0,
> 0.9120397567749023), (1321031475.0, 0.2386845201253891)]
> [(1321031471.0, 0.6449417471885681) (1321031472.0, 0.7889317274093628)
> (1321031473.0, 0.609708845615387) (1321031474.0, 0.9120397567749023)
> (1321031475.0, 0.2386845201253891)]
> ---end output---
> 
> Creating the index after the data has been inserted produces the same
> behavior (no data is returned while the index exists).  Any
> suggestions would be greatly appreciated.

I've reproduced with a number of different index configurations. If I change 
the column type to Float64, then the index works as expected.

BEFORE:
Initial index: verbose          has_index= True         use_index=          
frozenset(['Awhen'])        where= 0        readWhere= 0
remove index                    has_index= False        use_index=              
   frozenset([])        where= 5        readWhere= 5
re-add index (non-verbose)      has_index= True         use_index=          
frozenset(['Awhen'])        where= 0        readWhere= 0
remove again                    has_index= False        use_index=              
   frozenset([])        where= 5        readWhere= 5
re-add index (with flush)       has_index= True         use_index=          
frozenset(['Awhen'])        where= 0        readWhere= 0
re-add index (full)             has_index= True         use_index=          
frozenset(['Awhen'])        where= 0        readWhere= 0
re-add index (ultralight)       has_index= True         use_index=          
frozenset(['Awhen'])        where= 0        readWhere= 0
re-add index (o=0)              has_index= True         use_index=          
frozenset(['Awhen'])        where= 0        readWhere= 0
re-add index (o=9)              has_index= True         use_index=          
frozenset(['Awhen'])        where= 0        readWhere= 0
re-index                        has_index= True         use_index=          
frozenset(['Awhen'])        where= 0        readWhere= 0
also index value                has_index= True         use_index=          
frozenset(['Awhen'])        where= 0        readWhere= 0


AFTER:
Initial index: verbose          has_index= True         use_index=          
frozenset(['Awhen'])        where= 5        readWhere= 5
remove index                    has_index= False        use_index=              
   frozenset([])        where= 5        readWhere= 5
re-add index (non-verbose)      has_index= True         use_index=          
frozenset(['Awhen'])        where= 5        readWhere= 5
remove again                    has_index= False        use_index=              
   frozenset([])        where= 5        readWhere= 5
re-add index (with flush)       has_index= True         use_index=          
frozenset(['Awhen'])        where= 5        readWhere= 5
re-add index (full)             has_index= True         use_index=          
frozenset(['Awhen'])        where= 5        readWhere= 5
re-add index (ultralight)       has_index= True         use_index=          
frozenset(['Awhen'])        where= 5        readWhere= 5
re-add index (o=0)              has_index= True         use_index=          
frozenset(['Awhen'])        where= 5        readWhere= 5
re-add index (o=9)              has_index= True         use_index=          
frozenset(['Awhen'])        where= 5        readWhere= 5
re-index                        has_index= True         use_index=          
frozenset(['Awhen'])        where= 5        readWhere= 5
also index value                has_index= True         use_index=          
frozenset(['Awhen'])        where= 5        readWhere= 5

Cheers,
~Josh.


> Alan


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to