On 30/09/2017 18:12, Sri G. wrote:
I'm learning programming with Python.

I’ve written the code below for finding the most common words in a text
file that has about 1.1 million words. It's working fine, but I believe
there is always room for improvement.

When run, the function in the script gets a text file from the command-line
argument sys.argv[1], opens the file in read mode, converts the text to
lowercase, makes a list of words from the text after removing any
whitespaces or empty strings, and stores the list elements as dictionary
keys and values in a collections.Counter object. Finally, it returns a
dictionary of the most common words and their counts. The
words.most_common() method gets its argument from the optional top

import sysimport collections
def find_most_common_words(textfile, top=10):
     ''' Returns the most common words in the textfile.'''

     textfile = open(textfile)
     text = textfile.read().lower()

The modern Pythonic way is:-

with open(textfile) as textfile:
    text = textfile.read().lower()

The file close is handled automatically for you. For those who don't know this construct using the "with" keyword is called a context manager, here's an article about them https://jeffknupp.com/blog/2016/03/07/python-with-context-managers/

     words = collections.Counter(text.split()) # how often each word appears

     return dict(words.most_common(top))

filename = sys.argv[1]

How about some error handling if the user forgets the filename? The Pythonic way is to use a try/except looking for an IndexError, but there's nothing wrong with checking the length of sys.argv.

top_five_words = find_most_common_words(filename, 5)

I need your comments please.


Pretty good all in all :)

My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

This email has been checked for viruses by AVG.

Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:

Reply via email to