Either way; nest the for loops and index with protein IDs or dictionary one file and write the other with matches to the dictionary:
non-python pseudocode: for every line in TWO: get the first protein ID for every line in ONE: if the second protein ID is the same as the first: perform the string merging and write it to the file else: pass to the next protein ID in ONE --OR-- for every line in ONE: make a dictionary with a key = to the protein ID and the value, the rest for every line in TWO: if the dictionary has the same protein ID: perform the string merging and write to the file I'm inferring an 'inner join' (drop non-matches), for an 'outer/right join' (keep everything in TWO) initialize a 'matchmade' variable in the inner loop and if no matches are made, write the protein to the merged file with null values. If you plan on querying or sharing the newly organized dataset use a database. If this file is going to into a workflow, it probably wants text. I'd probably do both.
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor