I've been finding pytables useful for organizing big genomics data (e.g. storing and querying ~200 Gb all vs all uniparc smith-waterman hits from Uniprot).
One thing that has surprised me a little: I was interested in the efficiency of querying small tables storing an index (integer) and an integer value vs storing the values in an array. I am finding the second option about 10X faster when selecting the index of a particular integer value. I would have guessed the 'kernel' selection would have been faster than reading out the entire array and then using numpy.where(). Is this expected, or can I do something to make the table selection faster? In this case I am fine with the second option, so this is just for future reference. eg. Option 1: class testTable(IsDescription): index = UInt8Col(pos=0) id = UInt32Col(pos=1) h5_file.createTable(group,'test1',testTable,expectedrows=5000) def fxn1(group,id): """ Retrieve rows from pytables table. """ return [x['index'] for x in group.test1.where("id == %s" % id)] ######################### Option 2: z = numpy.array([id1, id2, ...]) h5_file.createArray(group,'test2',z) def fxn2(group,id): """ Retrieve rows from pytables array. About 10x faster than selecting from table! """ return where(group.test2.listarr == id)[0] Thanks, Rich
------------------------------------------------------------------------------
_______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users