Ciao Ernesto, A Wednesday 28 April 2010 16:04:06 Ernesto escrigué: > Hi all, > > I have a table containing a lot of data (millions of rows). > The structure is like the following example: > > (22777420, 'G', 18, '-') > (22777421, 'G', 36, '-') > (22777422, 'C', 29, '-') > (22777423, 'C', 17, '-') > (22777424, 'A', 31, '-') > (22777425, 'A', 42, '-') > (22777426, 'C', 49, '-') > (22777305, 'T', 0, '-') > (22777306, 'C', 18, '-') > (22777307, 'C', 29, '-') > (22777308, 'T', 26, '-') > (22777309, 'T', 10, '-') > (22777310, 'G', 15, '-') > (22777311, 'G', 33, '-') > > The first column contains an integer. Now I'd like to sort my table > according to numbers of the first column. Is there a way to perform this > action?
Yes. The simplest way is by setting the `sortby` parameter to true in the `Table.copy()` method. This triggers an on-disk sorting operation, so you don't have to be afraid of your available memory. You will need the Pro version for getting this capability. > A second question concerns the iteration over a huge amount of > data. For example, given the above table, I would to work on a subset of > rows using an iterator in order to avoid memory errors. Is there also here > a simple procedure? I think what you are looking for is the `Table.where()` iterator. See: http://www.pytables.org/docs/manual/ch04.html#TableMethods_querying Also, the Pro version has the ability to index your tables, making your queries via `Table.where()` very fast (most specially over completely sorted tables). For some figures on the improvements you can expect, see: http://www.pytables.org/docs/manual/ch05.html#searchOptim and, in particular: http://www.pytables.org/docs/manual/ch05.html#Sorting-indexes Cheers, -- Francesc Alted ------------------------------------------------------------------------------ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users