Ciao Ernesto,

A Wednesday 28 April 2010 16:04:06 Ernesto escrigué:
> Hi all,
> 
> I have a table containing a lot of data (millions of rows).
> The structure is like the following example:
> 
> (22777420, 'G', 18, '-')
> (22777421, 'G', 36, '-')
> (22777422, 'C', 29, '-')
> (22777423, 'C', 17, '-')
> (22777424, 'A', 31, '-')
> (22777425, 'A', 42, '-')
> (22777426, 'C', 49, '-')
> (22777305, 'T', 0, '-')
> (22777306, 'C', 18, '-')
> (22777307, 'C', 29, '-')
> (22777308, 'T', 26, '-')
> (22777309, 'T', 10, '-')
> (22777310, 'G', 15, '-')
> (22777311, 'G', 33, '-')
> 
> The first column contains an integer. Now I'd like to sort my table
>  according to numbers of the first column. Is there a way to perform this
>  action?

Yes.  The simplest way is by setting the `sortby` parameter to true in the 
`Table.copy()` method.  This triggers an on-disk sorting operation, so you 
don't have to be afraid of your available memory.  You will need the Pro 
version for getting this capability.

>  A second question concerns the iteration over a huge amount of
>  data. For example, given the above table, I would to work on a subset of
>  rows using an iterator in order to avoid memory errors. Is there also here
>  a simple procedure?

I think what you are looking for is the `Table.where()` iterator.  See:

http://www.pytables.org/docs/manual/ch04.html#TableMethods_querying

Also, the Pro version has the ability to index your tables, making your 
queries via `Table.where()` very fast (most specially over completely sorted 
tables).  For some figures on the improvements you can expect, see:

http://www.pytables.org/docs/manual/ch05.html#searchOptim

and, in particular:

http://www.pytables.org/docs/manual/ch05.html#Sorting-indexes

Cheers,

-- 
Francesc Alted

------------------------------------------------------------------------------
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to