Re: [Tutor] sorting and editing large data files

Rich Krauter Thu, 16 Dec 2004 05:51:00 -0800

Scott Melnyk wrote:

Hello!

I recently suffered a loss of programming files (and I had been
putting off my backups...)

[snip]


#regular expression to pull out gene, transcript and exon ids

info=re.compile('^(ENSG\d+\.\d).+(ENST\d+\.\d).+(ENSE\d+\.\d)+')
#above is match gene, transcript, then one or more exons


#TFILE = open(sys.argv[1], 'r' )                        #read the various 
transcripts from
WFILE=open(sys.argv[1], 'w')                        # file to write 2 careful 
with 'w'
will overwrite old info in file
W2FILE=open(sys.argv[2], 'w')                       #this file will have the 
names of
redundant exons
import sets
def getintersections(fname='Z:\datasets\h35GroupedDec15b.txt'):
        exonSets = {}
        f = open(fname)
        for line in f:
            if line.startswith('ENS'):
                parts = line.split()
                gene = parts[0]
                transcript = parts[1]
                exons = parts[2:]
                exonSets.setdefault(gene,
                         sets.Set(exons)).intersection(sets.Set(exons))

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

return exonSets


Hi Scott,

There may be other problems, but here's one thing I noticed:

exonSets.setdefault(gene,
    sets.Set(exons)).intersection(sets.Set(exons))

should be

exonSets.setdefault(gene,
   sets.Set(exons)).intersection_update(sets.Set(exons))

Hope that helps.

Rich

_______________________________________________
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] sorting and editing large data files

Reply via email to