Why not read in just the date and ID columns to start with, then do a
numpy.unique() or python set() on theses, then query based on the unique
values?  Seems like it might be faster....

Be Well
Anthony

On Mon, Jul 2, 2012 at 5:16 PM, Aquil H. Abdullah
<aquil.abdul...@gmail.com>wrote:

> Hello All,
>
> I have a table that is indexed by two keys, and I would like to search for
> duplicate keys.  So here is my naive slow implementation: (code I posted on
> stackoverflow)
>
> import tables
>
>
> h5f = tables.openFile('filename.h5')
>
>
> tbl = h5f.getNode('/data','data_table') # assumes group data and table 
> data_table
>
>
> counter += 0
>
>
> for row in tbl:
>
>
>     ts = row['date'] # timestamp (ts) or date
>
>
>     uid = row['userID']
>
>
>     query = '(date == %d) & (userID == "%s")' % (ts, uid)
>
>
>     result = tbl.readWhere(query)
>
>
>     if len(result) > 1:
>
>
>         # Do something here
>
>
>         pass
>
>
>     counter += 1
>
>
>     if counter % 1000 == 0: print '%d rows processed'
>
>
>
> --
> Aquil H. Abdullah
> aquil.abdul...@gmail.com
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to