Hi Jon Olav,

A Tuesday 07 July 2009 13:44:30 Jon Olav Vik escrigué:
> The problem in brief: Why does it take 20-40 seconds to extract a table
> column of 200000 integers? The code snippet in question is:
>
> with pt.openFile(filename) as f:
>     vlarrayrow = f.root.gp.cols.vlarrayrow[:]

Quick answer: when your dataset fits in OS filesystem memory cache, the 
retrieval is very fast.  If not, you can go at you disk speed as maximum.

For example, in my 8 GB machine, the results of your benchmark are:

INFO:root:0.270664930344 seconds, (nrow,othersize=50000,2000)                   
0.270664930344                                                                  
INFO:root:0.232949972153 seconds, (nrow,othersize=50000,2000)                   
INFO:root:0.235331773758 seconds, (nrow,othersize=52000,2000)                   
INFO:root:0.244709968567 seconds, (nrow,othersize=54000,2000)                   
INFO:root:0.245009899139 seconds, (nrow,othersize=56000,2000)                   
INFO:root:0.276184082031 seconds, (nrow,othersize=58000,2000)                   
INFO:root:0.262171030045 seconds, (nrow,othersize=50000,2200)                   
INFO:root:0.339207172394 seconds, (nrow,othersize=52000,2200)                   
INFO:root:0.288460016251 seconds, (nrow,othersize=54000,2200)                   
INFO:root:0.294430017471 seconds, (nrow,othersize=56000,2200)                   
INFO:root:0.302571773529 seconds, (nrow,othersize=58000,2200)                   
INFO:root:0.279940843582 seconds, (nrow,othersize=50000,2400)                   
INFO:root:0.290556192398 seconds, (nrow,othersize=52000,2400)                   
INFO:root:0.309056997299 seconds, (nrow,othersize=54000,2400)                   
INFO:root:0.317222118378 seconds, (nrow,othersize=56000,2400)                   
INFO:root:0.327784061432 seconds, (nrow,othersize=58000,2400)                   

Here, the speed to retrieve the first table (381 MB) is around .23 s, which 
makes for a speed of 1.6 GB/s aprox, while for the last table (531 MB) is .33 
s, which makes for the same 1.6 GB/s speed.  So, I can't see the performance 
gap at all.  The reason is that both sizes fits well in OS filesystem cache 
and they can be transferred at RAM speeds.  However, if I try with a much 
larger table (19 GB), I get:

INFO:root:110.066845894 seconds, (nrow,othersize=1000000,5000)
110.066845894

which makes for 173 MB/s, which is the speed of my disk and 10x less than my 
memory subsystem.  For your case, my guess is that your table sizes are in the 
limits of your available RAM for OS caching, and hence the apparently erratic 
read speeds.

As you may have noticed, I've counted the *total* size of the table as being 
read instead of the size of only one single column.  This is because the table 
is organized row-wise on-disk, and you need to read *all* the columns for 
accessing just one.  The only solution to avoid this is to implement a column-
wise table, which I'd like to implement in a next future (but not there yet).

At any rate, that the speed of table access is so high when the table fits in 
OS cache is a good indication that PyTables is behaving well and getting the 
most out of the underlying hardware :)

Cheers,

-- 
Francesc Alted

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to