On Sun, 4 Dec 2005, Srinivas Iyyer wrote:
> Contr1 SPR-10 SPR-101 SPR-125 SPR-137 SPR-139 SPR-143 > contr2 SPR-1 SPR-15 SPR-126 SPR-128 SPR-141 SPR-148 > contr3 SPR-106 SPR-130 SPR-135 SPR-138 SPR-139 SPR-145 > contr4 SPR-124 SPR-125 SPR-130 SPR-139 SPR-144 SPR-148 Hi Srinivas, I'd strongly recommend changing the data representation from a line-oriented to a more structured view. Each line in your data above appears to describe a conceptual set of tuples: (control_number, spr_number) For example, we can think of the line: Contr1 SPR-10 SPR-101 SPR-125 SPR-137 SPR-139 SPR-143 as an encoding for the set of tuples written below (The notation I use below is mathematical and not meant to be interpreted as Python.): { (Contr1, SPR-10), (Contr1, SPR-101), (Contr1, SPR-125), (Contr1, SPR-137), (Contr1, SPR-139), (Contr1, SPR-143) } I'm not sure if I'm seeing everything, but from what I can tell so far, your data cries out to be held in a relational database. I agree with Kent: you do not need to "align" anything. If, within your sequence, each element has to be unique in that sequence, then your "alignment" problem transforms into a simpler table lookup problem. That is, if all your data looks like: 1: A B D E 2: A C F 3: A B C D where no line can have repeated characters, then that data can be transformed into a simple tablular representation, conceptually as: A B C D E F 1 | x | x | | x | x | | 2 | x | | x | | | x | 3 | x | x | x | x | | | So unless there's something here that you're not telling us, there's no need for any complicated alignment algorithms: we just start off with an empty table, and then for each tuple, check the corresponding entry in the table. Then when we need to look for common elements, we just scan across a row or column of the table. BLAST is cool, but, like regular expressions, it's not the answer to every string problem. If you want to implement code to do the above, it's not difficult, but you really should use an SQL database to do this. As a bioinformatician, it would be in your best interest to know SQL, because otherwise, you'll end up trying to reinvent tools that have already been written for you. A good book on introductory relational database usage is "The Practical SQL Handbook: Using Structured Query Language" by Judith Bowman, Sandra Emerson, and Marcy Darnovsky. Good luck to you. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor