Dear Francesc, thank you for your reply. I'll try to better explain my problem using real examples of data and code.
As I wrote I start with an input file. It contains a string of variable length (10e7-10e8). This string consists of four different characters (A,C,G,T), the bases of a DNA molecule. The format of the input file is: >scaffold_0 AGCAGTGACAGATGACAGATGACAGATGACAGTGAC AGCAGTGACAGATGACAGATGACAGATGACAGTGAC AGCAGTGACAGATGACAGATGACAGATGACAGTGAC ... until 10e8 characters Each character or base can be associated to a specific position. The first A has position 1, the second G 2 and so on. Using pytables I can store all characters base by base in a structure like the following: (1, A) (2, G) ... and so on Then I have a second file in which there are other strings and related positions. Reading this file, I have to update the table according to the position. For example I read the at the position 2 I have another G, at position 3 a C, at position 1 a G. According to the position I can associate: (1, A) --> G (2, G) --> G (3, C) --> C I can read the same position more than time, a variable number of time. (1, A) --> GGGGAAAAAAAAAAA (2, G) --> GGGGGGCGGG (3, C) --> CCCCC I cannot predict a priori the number of character to associate to each position. As you suggested I tried to use a vlarray. In practice during the generation of the table I build also the vlarray in order to inizialize the structure. The code I tried is the following: from tables import * from numpy import * class NucSeq(IsDescription): id = Int32Col(pos=1) # integer gnuc = StringCol(1, pos=2) # 1-character String # Open a file in "w"rite mode fileh = openFile("table1.h5", mode = "w") root = fileh.root # Create a new group group = fileh.createGroup(root, "newgroup") # Create a new table in newgroup group tableNuc = fileh.createTable(group, 'tableNuc', NucSeq, "tableNuc", Filters(1)) nucseq = tableNuc.row vlarray = fileh.createVLArray(root, 'vlarray', StringAtom(itemsize=1), "vlarray test") f=open("seq") x=1 for i in f: if i[0]!=">": l=i.strip() for j in l: nucseq['id']=x nucseq['gnuc']=j nucseq.append() vlarray.append([]) x+=1 f.close() tableNuc.flush() fileh.close() If I remove the vlarray, pytables can build the table in several seconds. Adding the vlarray the time increases and the same job can be completed after more than 20 hours. In the code above I preferred to inizialize the structure because then I can quickly add each character calling the specific position. If you need I could provide the "seq" file (it is 4MB after compression). Thank you very much in advance for any help and suggestion. Ernesto PS: sorry for the late answer but I don't receive directly the reply. I don't know why. ------------------------------------------------------------------------------ Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users