Re: [Pytables-users] Extracting unique values of a column

2009-05-12 Thread Francesc Alted
On Tuesday 12 May 2009 14:00:41 Armando Serrano Lombillo wrote: > Ok, it looks like we were writing similar emails at the same time. :) > > I'll change my code right away, but I'm still interested in what exactly > was slowing my first approach. Was it the way I accessed the file, that is, > is t.c

Re: [Pytables-users] Extracting unique values of a column

2009-05-12 Thread Armando Serrano Lombillo
Oh, and it also surprises me that using a dictionary of Nones is faster than using a set. Maybe python's set type needs some performance optimizations, but that has nothing to do with pytables. Armando. On Tue, May 12, 2009 at 2:00 PM, Armando Serrano Lombillo < [email protected]> wrote: > Ok,

Re: [Pytables-users] Extracting unique values of a column

2009-05-12 Thread Armando Serrano Lombillo
Ok, it looks like we were writing similar emails at the same time. :) I'll change my code right away, but I'm still interested in what exactly was slowing my first approach. Was it the way I accessed the file, that is, is t.colinstances[ind] slow? Or was it that directly building the set is slower

Re: [Pytables-users] Extracting unique values of a column

2009-05-12 Thread Francesc Alted
On Tuesday 12 May 2009 12:29:04 Armando Serrano Lombillo wrote: > Compression: zlib, level 1. > Size: 150 MB (compressed) but it could be even bigger, or it could be less > than 1 MB. Anyway, even with small files, I find it slower than I would > expect. > Available memory: depends. I am now runnin

Re: [Pytables-users] Extracting unique values of a column

2009-05-12 Thread Armando Serrano Lombillo
H, I've been doing some more tests with the following surprising (at least for me) results. If instead of using: dict((name, set(t.colinstances[name])) for name in t.colnames) I use: names = t.colnames result = dict((name, set()) for name in names) for row in t: for name in names:

Re: [Pytables-users] Extracting unique values of a column

2009-05-12 Thread Armando Serrano Lombillo
Compression: zlib, level 1. Size: 150 MB (compressed) but it could be even bigger, or it could be less than 1 MB. Anyway, even with small files, I find it slower than I would expect. Available memory: depends. I am now running it with 512 MB of RAM. Expectedrows: no, I didn't know about it. Other i

Re: [Pytables-users] Extracting unique values of a column

2009-05-12 Thread Francesc Alted
On Tuesday 12 May 2009 10:02:53 Armando Serrano Lombillo wrote: > Hello list. I have a (potentially very big) table in PyTables. I now want > to extract all the unique values of each column. I have tried doing: > > dict((name, set(t.colinstances[ind])) for name in t.colnames) > > (where t is of cou

[Pytables-users] Extracting unique values of a column

2009-05-12 Thread Armando Serrano Lombillo
Hello list. I have a (potentially very big) table in PyTables. I now want to extract all the unique values of each column. I have tried doing: dict((name, set(t.colinstances[ind])) for name in t.colnames) (where t is of course the table), but it is VERY slow. Is there a faster way? Armando. ---