On Tuesday 12 May 2009 14:00:41 Armando Serrano Lombillo wrote: > Ok, it looks like we were writing similar emails at the same time. :) > > I'll change my code right away, but I'm still interested in what exactly > was slowing my first approach. Was it the way I accessed the file, that is, > is t.colinstances[ind] slow? Or was it that directly building the set is > slower that using .add()? The difference is huge, as my impressions and > your benchmarks showed.
That's a good question. As I was not certain on what was happening there, I've done some profiling. Here are the routines that were consuming the most for your first method: Tue May 12 14:07:25 2009 tuniq1.prof 2401085 function calls (2401062 primitive calls) in 5.835 CPU seconds Ordered by: internal time, call count List reduced from 184 to 20 due to restriction <20> ncalls tottime percall cumtime percall filename:lineno(function) 50000 2.788 0.000 3.092 0.000 {method '_fillCol' of 'tables.tableExtension.Row' objects} 50000 0.442 0.000 3.569 0.000 table.py:1496(_read) 100000 0.313 0.000 0.861 0.000 leaf.py:425(_processRange) 150030/150010 0.253 0.000 0.491 0.000 file.py:880(_getNode) 50005 0.241 0.000 5.759 0.000 table.py:2914(__getitem__) 150025 0.220 0.000 0.236 0.000 file.py:249(__getitem__) 50000 0.209 0.000 4.822 0.000 table.py:1553(read) It is clear now that, for every element in the table a `Table.__getitem__()` was issued for every *single* item in table. As this is a user-accessible method, it has to do a lot of checks first in order to ensure that the user is requesting a valid item, and this has a lot of overhead. In comparison, the second method is using a table iterator, which is implemented as an extension (i.e. it is fast) and besides, only performs checks at the beginning. Also, by using the iterator you only have to read each item once per run, instead of once per existing column (remember that tables are implemented row-wise, and you were accessing items column-wise in method1). Finally, the table iterators always do buffered I/O, so reading data ahead and re-using this data in next iterations. All in all, this approach is much faster. The moral of this is: use table iterators whenever you can :) Cheers, -- Francesc Alted ------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users