Hello once more. I am stuck on how best to tie the finding Unique Items in Lists ideas to my file
I am stuck at level below: What I have here taken from the unique items thread does not work as I need to separate each grouping to the hg chain it is in (see below for examples) import sys WFILE=open(sys.argv[1], 'w') def get_list_dup_dict(fname='Z:/datasets/fooyoo.txt', threshold=2): a_list=open(fname, 'r') #print "beginning get_list_dup" items_dict, dup_dict = {}, {} for i in a_list: items_dict[i] = items_dict.get(i, 0) + 1 for k, v in items_dict.iteritems(): if v==threshold: dup_dict[k] = v return dup_dict def print_list_dup_report(fname='Z:/datasets/fooyoo.txt', threshold=2): #print "Beginning report generation" dup_dict = get_list_dup_dict(fname='Z:/datasets/fooyoo.txt', threshold=2) for k, v in sorted(dup_dict.iteritems()): print WFILE,'%s occurred %s times' %(k, v) if __name__ == '__main__': print_list_dup_report() My issue is that my file is as follows: hg17_chainMm5_chr15 range=chr7:148238502-148239073 ENST00000339563.1 ENST00000342196.1 ENST00000339563.1 ENST00000344055.1 hg17_chainMm5_chr13 range=chr5:42927967-42928726 ENST00000279800.3 ENST00000309556.3 hg17_chainMm5_chr6 range=chr1:155548627-155549517 ENST00000321157.3 ENST00000256324.4 I need a print out that would give the line hg17.... and then any instances of the ENST that occur more than once only for that chain section. Even better it only prints the hg17 line if it is followed by an instance of ENST that occurs more than once I am hoping for something that gives me an out file roughly like: hg17_chainMm5_chr15 range=chr7:148238502-148239073 ENST00000339563.1 occurs 2 times hg17_chainMm5_chr13 range=chr5:42927967-42928726 ENST00000279800.3 occurs 2 times All help and ideas appreciated, I am trying to get this finished as soon as possible, the output file will be used to go back to my 2 gb file and pull out the rest of the data I need. Thanks, Scott _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor