kumar s wrote:
dear tutors:
I have two files. I want to take coordiates of an row in fileA and find if they are in the range of coordinates in fileB. If they are, I want to be able to map else, pass. thanks
kumar

file a:
name     loc          x       y
a       4       40811596        40811620
b       4       40811619        40811643
c       4       40811649        40811673
d       4       40811734        40811758
e       4       40811797        40811821
f       4       40811817        40811841
g       4       40811895        40811919
h       4       40811938        40811962



file b:

                              zx       zy
z1      4       +       40810323        40812000
z2      4       +       40810323        40812000
z3      4       +       40810323        40812000
z4      4       +       40810323        40812000
z5      4       +       40810323        40812000
z6      4       +       40810323        40812000
z7      4       +       40810323        40812000
z8      4       +       40810323        40812000




I want to take coordiates x and y from each row in file a, and check if they are in range of zx and zy. If they are in range then I want to be able to write both matched rows in a tab delim single row.

my code:

f1 = open('fileA','r')
f2 = open('fileB','r')
da = f1.read().split('\n')
dat = da[:-1]
ba = f2.read().split('\n')
bat = ba[:-1]


for m in dat:
        col = m.split('\t')
        for j in bat:
                cols = j.split('\t')
                if col[1] == cols[1]:
                        xc = int(cols[2])
                        yc = int(cols[3])
                        if int(col[2]) in xrange(xc,yc):
                                if int(col[3]) in xrange(xc,yc):
                                        print m+'\t'+j

output:
a       4       40811596        40811620    z1 4 +  40810323     40812000



This code is too slow. Could you experts help me speed the script a lot faster. In each file I have over 50K rows and the script runs very slow.

Suggestions:

Translate the values to integer outside the comparison loop.

Test for >= lower value and <= upper value. xrange is overkill. Be aware of Python's shortcut:
lower <= x <= upper.

Use:
for m in f1:
 ...
for j in f2:

--
Bob Gailer
Chapel Hill NC
919-636-4239
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to