A Wednesday 27 October 2010 15:38:28 Gaetan de Menten escrigué: > Hi, > > I have a table with ~300 columns and ~150,000 rows and I need to copy > it from one file to another. > > However, the simplest methods I could find: > - input_file.copyNode(...) > - input_file.root.test_table.copy(output_file.root) > or even: > - input_file.copyFile(output_path) > > are all slow as hell: they take more than 1 min, while a simple: > > data = in_table.read() > out_table.append(data) > out_table.flush() > > takes only 1.88s, and a copying in chunks of 10000 rows takes 1.34s. > > FWIW, no compression whatsoever is used in any of those cases, and > using it does not reduce the copy time. > > That behavior does not show up with a small number of columns, but > the problem seem to grow geometrically with the number of columns. > Is there a setting somewhere that could alleviate this problem or is > it a known limitation or a bug?
After investigating this, I come to the conclusion that the overhead comes from PyTables when copying a couple of attributes per column (namely FIELD_N_NAME and FIELD_N_FILL, where N is the column number). I suspect that the ultimate responsible is an inefficiency in the HDF5 for dealing with these attributes (I should investigate more, though), so meanwhile I decided not copy the attributes during `Table.copy()` operations. With this, performance is good now. More info: http://pytables.org/trac/ticket/304 Anyway, I'm a bit fed up with such FIELD_N_NAME and FIELD_N_FILL attributes that are not really useful (except for some rare cases). So I'm thinking in removing them completely for PyTables 2.3, see: http://pytables.org/trac/ticket/305 If anyone is against this, please speak now or forever hold your peace! (I'll announce this in a proper thread also). -- Francesc Alted ------------------------------------------------------------------------------ Nokia and AT&T present the 2010 Calling All Innovators-North America contest Create new apps & games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users