I'm learning programming with Python. I’ve written the code below for finding the most common words in a text file that has about 1.1 million words. It's working fine, but I believe there is always room for improvement.
When run, the function in the script gets a text file from the command-line argument sys.argv[1], opens the file in read mode, converts the text to lowercase, makes a list of words from the text after removing any whitespaces or empty strings, and stores the list elements as dictionary keys and values in a collections.Counter object. Finally, it returns a dictionary of the most common words and their counts. The words.most_common() method gets its argument from the optional top parameter. import sysimport collections def find_most_common_words(textfile, top=10): ''' Returns the most common words in the textfile.''' textfile = open(textfile) text = textfile.read().lower() textfile.close() words = collections.Counter(text.split()) # how often each word appears return dict(words.most_common(top)) filename = sys.argv[1] top_five_words = find_most_common_words(filename, 5) I need your comments please. Sri _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor