On 10/31/12 4:02 PM, Francesc Alted wrote: > On 10/31/12 10:12 AM, Andrea Gavana wrote: >> Hi Francesc & All, >> >> On 31 October 2012 14:13, Francesc Alted wrote: >>> On 10/31/12 4:30 AM, Andrea Gavana wrote: >>>> Thank you for all your suggestions. I managed to slightly modify the >>>> script you attached and I am also experimenting with compression. >>>> However, in the newly attached script the underlying table is not >>>> modified, i.e., this assignment: >>>> >>>> for p in table: >>>> p['results'][:NUM_SIM, :, :] = >>>> numpy.random.random(size=(NUM_SIM, >>>> len(ALL_DATES), 7)) >>>> table.flush() >>> For modifying row values you need to assign a complete row object. >>> Something like: >>> >>> for i in range(len(table)): >>> myrow = table[i] >>> myrow['results'][:NUM_SIM, :, :] = >>> numpy.random.random(size=(NUM_SIM, len(ALL_DATES), 7)) >>> table[i] = myrow >>> >>> You may also use Table.modifyColumn() for better efficiency. Look at >>> the different modification methods here: >>> >>> http://pytables.github.com/usersguide/libref/structured_storage.html#table-methods-writing >>> >>> >>> >>> and experiment with them. >> Thank you, I have tried different approaches and they all seem to run >> more or less at the same speed (see below). I had to slightly modify >> your code from: >> >> table[i] = myrow >> >> to >> >> table[i] = [myrow] >> >> To avoid exceptions. >> >> In the newly attached file, I switched to blosc for compression (but >> with compression level 1) and run a few sensitivities. By calling the >> attached script as: >> >> python pytables_test.py NUM_SIM >> >> where "NUM_SIM" is an integer, I get the following timings and file >> sizes: >> >> C:\MyProjects\Phaser\tests>python pytables_test.py 10 >> Number of simulations : 10 >> H5 file creation time : 0.879s >> Saving results for table: 6.413s >> H5 file size (MB) : 193 >> >> >> C:\MyProjects\Phaser\tests>python pytables_test.py 100 >> Number of simulations : 100 >> H5 file creation time : 4.155s >> Saving results for table: 86.326s >> H5 file size (MB) : 1935 >> >> >> I dont think I will try the 1,000 simulations case :-) . I believe I >> still don't understand what the best strategy would be for my problem. >> I basically need to save all the simulation results for all the 1,200 >> "objects", each of which has a timeseries matrix of 600x7 size. In the >> GUI I have, these 1,200 "objects" are grouped into multiple >> categories, and multiple categories can reference the same "object", >> i.e.: >> >> Category_1: object_1, object_23, object_543, etc... >> Category_2: object_23, object_100, object_543, etc... >> >> So my idea was to save all the "objects" results to disk and, upon the >> user's choice, build the categories results "on the fly", i.e. by >> seeking the H5 file on disk for the "objects" belonging to that >> specific category and summing up all their results (over time, i.e. >> the 600 time-steps). Maybe I would be better off with a 4D array >> (NUM_OBJECTS, NUM_SIM, TSTEPS, 7) as a table, but then I will lose the >> ability to reference the "objects" by their names... > > You should keep trying experimenting with different approaches and > discover the one that works for you the best. Regarding using the 4D > array as a table, I might be misunderstanding your problem, but you > can still reference objects by name by using: > > row = table.where("name == %s" % my_name) > table[row.nrow] = ...
Uh, I rather meant: row = table.readWhere("name == %s" % my_name) table[row.nrow] = ... but you probably got the idea already. -- Francesc Alted ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users