Ciao Ernesto, A Thursday 10 December 2009 21:19:00 Ernesto escrigué: > Dear all, > > I'm new to pytables. I would use this package to store biological > data. In practice I have 2 input files. > The first consists in a long string of characters. The length is in > the range 10e7-10e8. I have to read such file character by character > and store it in a table according to its position (counting from 1). > For example: > > 1 A > 2 C > 3 G > 4 T > and so on. > > Then, I have to read a second file and during the reading I have to > associate to each character of the previous table another character, > according to the position. Example: > > 1 A A > 2 C C > 3 G G > 4 T T > and so on. > > However, the same position can occur a variable number of times and so > I have to associate a variable number of characters to each position. > Example: > > 1 A AAAAA > 2 C CCCCTC > 3 G GGGGAGGGG > 4 T TGGGTGTTTTTTTT > and so on. > > I tried to use a vlarray for each position, updating the array every > time needed. However, I noted that the creation of table according to > the first structure above was very fast. Adding the vlarray I noted, > instead, a drammatic performance reductiion in term of time (from > seconds to many hours [I stopped the script before the conclution]). > > Is there a way to speed up the process when there are vlarray?
Mmh, 10e7-10e8 variable lengths is quite a lot indeed (btw, how many entries do you have?). Perhaps it would be better to build the VLArray entries first in memory and then write them. Also, it would be worth the effort to try a compressed EArray instead. > I also tried to use the same table with a very long string size > instead of vlarray but also in this case the time needed to buil the > table was very high. > > Since I don't known very well pytables, is there a way to improve the > performance? It would help if you can post a self contained code example so that we can experiment with it. Many times changing the approach to your problem is much more effective than optimizing the use of PyTables containers. -- Francesc Alted ------------------------------------------------------------------------------ Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users