dear group,
I have two files in a text format and look this way: File a1.txt: >a1 TTAATTGGAACA >a2 AGGACAAGGATA >a3 TTAAGGAACAAA File b1.txt: >b1 TTAATTGGAACA >b2 AGGTCAAGGATA >b3 AAGGCCAATTAA I want to check if there are common elements based on ATGC sequences. a1 and b1 are identical sequences and I want to select them and print the headers (starting with > symbol). a1 '\t' b1 Here: >XXXXX is called header and the line followed by >line is sequence. In bioinformatics, this is called a FASTA format. What I am doing here is, I am matching the sequences (these are always 25 mers in this instance) and if they match, I am asking python to write the header +'\t'+ header ak = a[1::2] av = a[::2] seq_dict = dict(zip(ak,av)) ************************************** >>>seq_dict {'TTAAGGAACAAA': '>a3', 'AGGACAAGGATA': '>a2', 'TTAATTGGAACA': '>a1'} ************************************** bv = b[1::2] *************************************** >>>bv ['TTAATTGGAACA', 'AGGTCAAGGATA', 'AAGGCCAATTAA'] >>>for i in bv: if seq_dict.has_key(i): print seq_dict[i] >a1 *************************************** Here a1 is the only common element. However, I am having difficulty printing that b1 is identical to a1 how do i take b and do this search. It was easy for me to take the sequence part by doing b[1::2]. however, I want to print b1 header has same sequence as a1 a1 +'\t'+b1 Is there anyway i can do this. This is very simple and due to my brain block, I am unable to get it out. Can any one please help me out. Thanks __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor