Hi,
Suppose I have a string such as this
'aabccccccefggggghiiijkr'

I would like to print out all the positions that are flanked by a run
of symbols.
So for example, I would like to the output for the above input as
follows:

2  b  1 aa
2  b  -1 cccccc
10  e  -1 cccccc
11  f  1 ggggg
17 h  1 iii
17 h -1 ggggg

where the first column is the position of interest, the next column is
the entry at that position,
1 if the following column refers to a runs that come after and -1 if
the runs come before

I can do this easily for forward (shown below) but not clear how to do
this
backwards.

I would really appreciate it if someone can help with this problem.

I feel like a regex solution would be possible but I am not too good
with regex.

The code for forward is as follows:

def homopolymericSites(Seq):
                Seq=Seq.upper()
                i=0
                len_seq=len(Seq)-1# hack to prevent boundary condition
                while i < len_seq:
                        bi=Seq[i]
                        k=1
                        # go to the start of a homopolymer
                        while 1:
                                if i+k >= len_seq: break # no more sequence left
                                if bi==Seq[i+k]:
                                        k+=1
                                else:
                                        break
                        if k>1: # homopolymer length
                                i=i+k
                                id_of_chr_which_proceeds_homopolymer=Seq[i] # 
note not i+1
                                pos_of_chr_which_proceeds_homopolymer=i+1       
# +1 to convert it to 1-
index notation
                                id_of_homopolymer=Seq[i-1]
                                length_of_homopolymer=k

                                print "%s\t%s/%s\t%s" 
%(pos_of_chr_which_proceeds_homopolymer,
id_of_chr_which_proceeds_homopolymer, id_of_homopolymer,
                                                                                
        length_of_homopolymer)
                        else:
                                i+=1
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to