Hmmmm, I've been doing some more tests with the following surprising (at
least for me) results. If instead of using:

dict((name, set(t.colinstances[name])) for name in t.colnames)

I use:

names = t.colnames
result = dict((name, set()) for name in names)
for row in t:
    for name in names:
        d_inds[name].add(row[name])

the operation runs very fast.

Anybody knows what's going on??

Armando

On Tue, May 12, 2009 at 12:29 PM, Armando Serrano Lombillo <
arser...@gmail.com> wrote:

> Compression: zlib, level 1.
> Size: 150 MB (compressed) but it could be even bigger, or it could be less
> than 1 MB. Anyway, even with small files, I find it slower than I would
> expect.
> Available memory: depends. I am now running it with 512 MB of RAM.
> Expectedrows: no, I didn't know about it.
> Other information: I first create the table, save the file and close it. At
> the time of creating the table I can't now how many rows there will be. I
> then reopen the file and extract the unique values.
> I'm running it on Windows XP, python 2.5.
>
>
> On Tue, May 12, 2009 at 11:33 AM, Francesc Alted <fal...@pytables.org>wrote:
>
>> On Tuesday 12 May 2009 10:02:53 Armando Serrano Lombillo wrote:
>> > Hello list. I have a (potentially very big) table in PyTables. I now
>> want
>> > to extract all the unique values of each column. I have tried doing:
>> >
>> > dict((name, set(t.colinstances[ind])) for name in t.colnames)
>> >
>> > (where t is of course the table), but it is VERY slow.
>> >
>> > Is there a faster way?
>>
>> We need a bit more info.  How exactly large your table is in comparison
>> with
>> your available memory?  Are you using compression?  If yes, which
>> compressor
>> exactly?  Finally, have you specified the `expectedrows` parameter in the
>> Table constructor?
>>
>> Cheers,
>>
>> --
>> Francesc Alted
>>
>>
>> ------------------------------------------------------------------------------
>> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
>> production scanning environment may not be a perfect world - but thanks to
>> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK
>> i700
>> Series Scanner you'll get full speed at 300 dpi even with all image
>> processing features enabled. http://p.sf.net/sfu/kodak-com
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>
>
------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to