On Tuesday 12 May 2009 14:00:41 Armando Serrano Lombillo wrote:
> Ok, it looks like we were writing similar emails at the same time. :)
>
> I'll change my code right away, but I'm still interested in what exactly
> was slowing my first approach. Was it the way I accessed the file, that is,
> is t.c
Oh, and it also surprises me that using a dictionary of Nones is faster than
using a set. Maybe python's set type needs some performance optimizations,
but that has nothing to do with pytables.
Armando.
On Tue, May 12, 2009 at 2:00 PM, Armando Serrano Lombillo <
[email protected]> wrote:
> Ok,
Ok, it looks like we were writing similar emails at the same time. :)
I'll change my code right away, but I'm still interested in what exactly was
slowing my first approach. Was it the way I accessed the file, that is, is
t.colinstances[ind] slow? Or was it that directly building the set is slower
On Tuesday 12 May 2009 12:29:04 Armando Serrano Lombillo wrote:
> Compression: zlib, level 1.
> Size: 150 MB (compressed) but it could be even bigger, or it could be less
> than 1 MB. Anyway, even with small files, I find it slower than I would
> expect.
> Available memory: depends. I am now runnin
H, I've been doing some more tests with the following surprising (at
least for me) results. If instead of using:
dict((name, set(t.colinstances[name])) for name in t.colnames)
I use:
names = t.colnames
result = dict((name, set()) for name in names)
for row in t:
for name in names:
Compression: zlib, level 1.
Size: 150 MB (compressed) but it could be even bigger, or it could be less
than 1 MB. Anyway, even with small files, I find it slower than I would
expect.
Available memory: depends. I am now running it with 512 MB of RAM.
Expectedrows: no, I didn't know about it.
Other i
On Tuesday 12 May 2009 10:02:53 Armando Serrano Lombillo wrote:
> Hello list. I have a (potentially very big) table in PyTables. I now want
> to extract all the unique values of each column. I have tried doing:
>
> dict((name, set(t.colinstances[ind])) for name in t.colnames)
>
> (where t is of cou
Hello list. I have a (potentially very big) table in PyTables. I now want to
extract all the unique values of each column. I have tried doing:
dict((name, set(t.colinstances[ind])) for name in t.colnames)
(where t is of course the table), but it is VERY slow.
Is there a faster way?
Armando.
---