I'm learning programming with Python.

I’ve written the code below for finding the most common words in a text
file that has about 1.1 million words. It's working fine, but I believe
there is always room for improvement.

When run, the function in the script gets a text file from the command-line
argument sys.argv[1], opens the file in read mode, converts the text to
lowercase, makes a list of words from the text after removing any
whitespaces or empty strings, and stores the list elements as dictionary
keys and values in a collections.Counter object. Finally, it returns a
dictionary of the most common words and their counts. The
words.most_common() method gets its argument from the optional top
 parameter.

import sysimport collections
def find_most_common_words(textfile, top=10):
    ''' Returns the most common words in the textfile.'''

    textfile = open(textfile)
    text = textfile.read().lower()
    textfile.close()
    words = collections.Counter(text.split()) # how often each word appears

    return dict(words.most_common(top))

filename = sys.argv[1]
top_five_words = find_most_common_words(filename, 5)

I need your comments please.

Sri
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to