kumar s wrote:
My situation:

I have a list of numbers that I have to match in
another list and write them to a new file:

List 1: range_cors

range_cors[1:5]

['161:378', '334:3', '334:4', '65:436']

List 2: seq

seq[0:2]

['>probe:HG-U133A_2:1007_s_at:416:177; Interrogation_Position=3330; Antisense;', 'CACCCAGCTGGTCCTGTGGATGGGA']


A slow method:

sequences = []
for elem1 in range_cors:

for index,elem2 in enumerate(seq): if elem1 in elem2: sequences.append(elem2) sequences.append(seq[index+1])

This process is very slow and it is taking a lot of
time. I am not happy.

It looks like you really only want to search every other element of seq. You could speed your loop up by using an explicit iterator:
for elem1 in range_cors:
i = iter(seq)
try:
tag, data = i.next(), i.next()
if elem1 in tag:
sequences.append(tag)
sequences.append(data)
except StopIteration:
pass


You don't say how long the sequences are. If range_cors is short enough you can use a single regex to do the search. (I don't actually know how short range_cors has to be or how this will break down if it is too long; this will probably work with 100 items in range_cors; it may only be limited by available memory; it may become slow to compile the regex when range_cors gets too big...) This will eliminate your outer loop entirely and I expect a substantial speedup. The code would look like this:

 >>> range_cors = ['161:378', '334:3', '334:4', '65:436']

Make a pattern by escaping special characters in the search string, and joining 
them with '|':
 >>> pat = '|'.join(map(re.escape, range_cors))
 >>> pat
'161\\:378|334\\:3|334\\:4|65\\:436'
 >>> pat = re.compile(pat)

Now you can use pat.search() to find matches:
 >>> pat.search('123:456')
 >>> pat.search('aaa161:378')
<_sre.SRE_Match object at 0x008DC8E0>

The complete search loop would look like this:

  i = iter(seq)
  try:
    tag, data = i.next(), i.next()
    if pat.search(tag):
      sequences.append(tag)
      sequences.append(data)
  except StopIteration:
    pass

Kent




A faster method (probably):


for i in range(len(range_cors)):

for index,m in enumerate(seq): pat = re.compile(i) if re.search(pat,seq[m]): p.append(seq[m]) p.append(seq[index+1])


I am getting errors, because I am trying to create an
element as a pattern in re.compile().



Questions:

1. Is it possible to do this. If so, how can I do
this.


Can any one help correcting my piece of code and
suggesting where I went wrong.


Thank you in advance.


-K


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor


_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to