On 19 Jan 2005, [EMAIL PROTECTED] wrote: > I have two lists: > > 1. Lseq: > >>>> len(Lseq) > 30673 >>>> Lseq[20:25] > ['NM_025164', 'NM_025164', 'NM_012384', 'NM_006380', > 'NM_007032','NM_014332'] > > > 2. refseq: >>>> len(refseq) > 1080945 >>>> refseq[0:25] > ['>gi|10047089|ref|NM_014332.1| Homo sapiens small > muscle protein, X-linked (SMPX), mRNA', > 'GTTCTCAATACCGGGAGAGGCACAGAGCTATTTCAGCCACATGAAAAGCATCGGAATTGAGATCGCAGCT', > 'CAGAGGACACCGGGCGCCCCTTCCACCTTCCAAGGAGCTTTGTATTCTTGCATCTGGCTGCCTGGGACTT', [...] > 'ACTTTGTATGAGTTCAAATAAATATTTGACTAAATGTAAAATGTGA', > '>gi|10047091|ref|NM_013259.1| Homo sapiens neuronal > protein (NP25), mRNA', [...]
> If Lseq[i] is present in refseq[k], then I am > interested in printing starting from refseq[k] until > the element that starts with '>' sign. > > my Lseq has NM_014332 element and this is also present > in second list refseq. I want to print starting from > element where NM_014332 is present until next element > that starts with '>' sign. > I could not think of any smart way to do this, > although I have tried like this: I give you the same answer I think you got the last times you asked such a question: use a dictionary if you want to search items. So how to do it? You could build a dictionary from refseq where the elements that can match the elemenst from Lseq are the keys. Then you iterate over Lseq, look if you find a key in your dictionary and if yes print the matching elemnt from the list. The next function creates a dictionary. The keys are the NM_... entries the values are the start and end indice of the corresponding entries. def build_dic (seq): keys = [] indice = [] for ind, entry in enumerate(seq): if entry.startswith('>'): key = entry.split('|')[3] keys.append(key) indice.append(ind) indice.append(-1) return dict(zip(keys, zip(indice, indice[1:]))) With that function you search for matching keys and if a match is found use the start and end index to extract the right elements from the list. def find_matching (rseq, lseq): d = build_dic(rseq) for key in lseq: if key in d: start, end = d[key] print rseq[start:end] Karl -- Please do *not* send copies of replies to me. I read the list _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor