On Jun 5, 12:41 pm, claire morandin <claire.moran...@gmail.com> wrote: > But I have a problem storing all size length to the value size as it is > always comes back with the last entry. > Could anyone explain to me what I am doing wrong and how I should set the > values for each dictionary?
Your code has two for loops, one that reads ERCC.txt into a dict, and one that reads blast.txt into a dict. The first assigns to `transcript`, the second to `blasttranscript`. When the loops are finished, you're using the _last_ value set for both `transcript` and `blasttranscript`. So, really, you want _three_ loops: two to load the files into dicts, then another to compare the two of them. If the transcripts in blast.txt are guaranteed to be a subset of ERCC.txt, then you could get away with two loops: # convenience function for splitting lines into values def get_transcript_and_size(line): columns = line.strip().split() return columns[0].strip(), int(columns[1].strip()) # read in blast_file blast_transcripts = {} with open('transcript_blast.txt') as blast_file: # this is a context manager, it'll close the file when it's finished for line in blast_file: blasttranscript, blastsize = get_transcript_and_size(line) blast_transcripts[blasttranscript] = blastsize # read in ERCC and compare to blast with open('transcript_ERCC.txt') as ercc_file, \ open('Not_sequenced_ERCC_transcript.txt', 'w') as unknown_transcript, \ open('transcript_out.txt', 'w') as out_file: # this is called a _nested_ context manager, and requires 2.7+ or 3.1+ for line in ercc_file: ercctranscript, erccsize = get_transcript_and_size(line) if ercctranscript not in blast_transcripts: print >> unknown_transcript, ercctranscript else: is_ninety_percent = blast_transcripts[ercctranscript] >= 0.9*erccsize print >> out_file, ercctranscript, is_ninety_percent I've cleaned up your code a bit, using more similar naming schemes and the same open/write procedures for all file access. Generally, any time you're repeating code, you should stick it into a function and use that instead, like the `get_transcript_and_size` func. If the columns in your two files are separated by tabs, or always by the same number of spaces, you can simplify this even further by using the csv module: http://docs.python.org/2/library/csv.html Hope this helps. -- http://mail.python.org/mailman/listinfo/python-list