On 30/09/2017 18:12, Sri G. wrote:
I'm learning programming with Python.

I’ve written the code below for finding the most common words in a text
file that has about 1.1 million words. It's working fine, but I believe
there is always room for improvement.

When run, the function in the script gets a text file from the command-line
argument sys.argv[1], opens the file in read mode, converts the text to
lowercase, makes a list of words from the text after removing any
whitespaces or empty strings, and stores the list elements as dictionary
keys and values in a collections.Counter object. Finally, it returns a
dictionary of the most common words and their counts. The
words.most_common() method gets its argument from the optional top
  parameter.

import sysimport collections
def find_most_common_words(textfile, top=10):
     ''' Returns the most common words in the textfile.'''

     textfile = open(textfile)
     text = textfile.read().lower()
     textfile.close()

The modern Pythonic way is:-

with open(textfile) as textfile:
    text = textfile.read().lower()

The file close is handled automatically for you. For those who don't know this construct using the "with" keyword is called a context manager, here's an article about them https://jeffknupp.com/blog/2016/03/07/python-with-context-managers/

     words = collections.Counter(text.split()) # how often each word appears

     return dict(words.most_common(top))

filename = sys.argv[1]

How about some error handling if the user forgets the filename? The Pythonic way is to use a try/except looking for an IndexError, but there's nothing wrong with checking the length of sys.argv.

top_five_words = find_most_common_words(filename, 5)

I need your comments please.

Sri

Pretty good all in all :)

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

---
This email has been checked for viruses by AVG.
http://www.avg.com


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to