Re: [Tutor] Please look at my wordFrequency.py

John Fouhy Mon, 10 Oct 2005 14:48:35 -0700

On 11/10/05, Dick Moores <[EMAIL PROTECTED]> wrote:
> I worked on this a LONG time for something I expected to just be an easy
> and possibly useful exercise. Three times I started completely over with
> a new approach. Had a lot of trouble removing exactly the characters I
> didn't want to appear in the output. Wished I knew how to debug other
> than just by using a lot of print statements.


I like pdb to debug, but I don't know how that works with IDLE.. (I
don't use IDLE)

Some comments:

----
textAsString = input.read()

S = ""
for c in textAsString:
    if c == "\n":
        S += ' '
    else:
        S += c
----

You could write this more concisely as:

S = textAsString.replace('\n', ' ')

----
# At this point, each element ("word" in code below) of L is
# a string containing a real word such as "dog",
# where "dog" may be prefixed and/or suffixed by strings of
# non-alphanumeric characters. So, for example, word could be "'dog?!".
# The following code first strips these prefixed or suffixed non-alphanumeric
# characters and then finds any words with dashes ("--") or forward
slashes ("/"),
# such as in "and/or". These then become 2 or more words without the
# dashes or slashes.
----

What about using regular expressions?

re.sub('\W+', ' ') will replace all non-alphanumeric characters with a
single ' '.  By the looks of things, the only difference is that if
you had something like 'foo.bar' or 'foo&bar', your code would leave
that as one word, whereas using the regex would convert it into two
words.

If you want to keep the meaning of your code intact, you could still
use a regex to do it.  Something like (untested)
re.sub('\b\W+|\W+\b|-+|/+', ' ') might work.

----
# Remove all empty elements of L, if any
while "" in L:
    L.remove("")

for e in saveRemovedForLaterL:
    L.append(e)

F = []

for word in L:
    k = L.count(word)
    if (k,word) not in F:
        F.append((k,word))
----

There are a lot of hidden loops in here:

1. '' in L
This will look at every element of L, until it finds "" or it gets to the end.
2. L.count(word)
This will also look at every element of L.

If you combine your loops into one, you should be able to save a lot of time.

eg:

for e in saveRemovedForLaterL:
    L.append(e)

counts = {}
for word in L:
    if not word:      # This skips empty words.
        continue
    try:
        counts[word] += 1
    except KeyError:
        counts[word] = 1
F = [(count, word) for word, count in counts.iteritems()]

--
John.
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Please look at my wordFrequency.py

Reply via email to