On Wed, Sep 27, 2017 at 8:03 AM Justin Israel <[email protected]>
wrote:

> Replying inline with comments and bugs...
>
> On Wed, Sep 27, 2017 at 6:02 AM jettam <[email protected]> wrote:
>
>> I am trying to list the top 3 most occurring words from a text document.
>> I have managed to distill it down to the top three  [13, 22, 24]. But for
>> some reason my final print statement gives me the 4 most reoccurring words
>> and not even in a numerical order [22, 22, 24, 13 ]   Could someone show me
>> why this is happening ?
>>
>> I have attached the text file that I am sourcing called EinsteinCredo.txt
>>
>> ''' Read this text file and return the top three most ocurring words '''
>>
>> inFile = r'E:\ProfessionalDevelopment\python\Introduction to Python
>> Scripting in Maya\week4\EinsteinCredo.txt'
>> wordList=[]
>> occurences=[]
>> with open(inFile, 'r') as fin:
>>
>>     # removes the punctuation and splits the words into a list
>>     for line in fin:
>>         punct = ["'","?",".","!","?",",","\r\n","-"]
>>
>
> This has a duplicate for "?". You should define this once, outside the
> entire loop.
>
>
>>
>>         for p in punct:
>>             line = line.replace(p,"").upper()
>>
>
> Just do the upper() once, before you split the string, instead of once for
> every time you replace punctuation.
>
>
>>
>>         line = line.split()
>>         for word in line:
>>             wordList.append(word)
>>
>>
>> # make a word count list
>> for x in wordList:
>>     occurences.append(wordList.count(x))
>>
>
> Be aware that this wordList can contain duplicate words. So you are adding
> counts for the same word.
>
>
>>
>>
>> # make a dictionary of both the wordList and occurences
>> wordFrequencey = dict(zip(wordList,occurences))
>>
>> # find the top three most occuring words
>> order = list(set(sorted(wordFrequencey.values())))
>>
>
> This is a bug. You disassociate the values from the keys, and change their
> order into a new list, which is also likely smaller than the key list. So
> none of the indices will match up any more. Also no reason to use the
> sorted() call in the way you are. When you pass the list into the set()
> they resort again.
>
>
>>
>>
>> topThree = order[-3:]
>>
>> # print the results
>> for k, v in wordFrequencey.items():
>>     if v in topThree:
>>         print 'the word " %s " occured %s times' % (k,v)
>>
>>
> This is the other part of the bug. Your approach of checking if the count
> value is in your top 3 is inherently broken. What if different words have
> the same count, such as 22 like your example? You will end up getting as
> many words as having those counts in your topThree.
>
> Remember that other mail thread where I was suggesting that you just get
> rid of the whole multi-list and zip approach? It would be better to just
> build up a dictionary directly within that word loop. That way you have a
> unique mapping of words to their occurrences. Then you can use
> sorted(words.items(), words.get) in order to sort the words by their value,
> in reverse order. That resulting list will let you slice off the last
> three, which will be the (key, val) tuples. You will no longer have issues
> with managing separate key/value lists.
>

I had a typo in this suggestion. I should have said to use:  sorted(words,
words.get)
The result would be the sorted keys based on their values. You can then
slice off the last 3 and look up their occurrences again in the words
dictionary.


>
> Let me know if you want the example, or if you just want to work through
> these suggestions on your own?
>
>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Python Programming for Autodesk Maya" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/python_inside_maya/0655c287-af16-4657-a808-e77d56d65ed3%40googlegroups.com
>> <https://groups.google.com/d/msgid/python_inside_maya/0655c287-af16-4657-a808-e77d56d65ed3%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/python_inside_maya/CAPGFgA1A6c3ArMO_a_cndTgiKf2T%3DwU_H3q1vH0QMB_zRG3xfA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to