John Fouhy wrote at 14:47 10/10/2005: >Some comments: > >---- >textAsString = input.read() > >S = "" >for c in textAsString: > if c == "\n": > S += ' ' > else: > S += c >---- > >You could write this more concisely as: > >S = textAsString.replace('\n', ' ')
Yes! Thanks. That should have occurred to me. >---- ># At this point, each element ("word" in code below) of L is ># a string containing a real word such as "dog", ># where "dog" may be prefixed and/or suffixed by strings of ># non-alphanumeric characters. So, for example, word could be "'dog?!". ># The following code first strips these prefixed or suffixed >non-alphanumeric ># characters and then finds any words with dashes ("--") or forward >slashes ("/"), ># such as in "and/or". These then become 2 or more words without the ># dashes or slashes. >---- > >What about using regular expressions? > >re.sub('\W+', ' ') will replace all non-alphanumeric characters with a >single ' '. By the looks of things, the only difference is that if >you had something like 'foo.bar' or 'foo&bar', your code would leave >that as one word, whereas using the regex would convert it into two >words. Well, I'll have to learn the re module first. But I will. >If you want to keep the meaning of your code intact, you could still >use a regex to do it. Something like (untested) >re.sub('\b\W+|\W+\b|-+|/+', ' ') might work. > >---- ># Remove all empty elements of L, if any >while "" in L: > L.remove("") > >for e in saveRemovedForLaterL: > L.append(e) > >F = [] > >for word in L: > k = L.count(word) > if (k,word) not in F: > F.append((k,word)) >---- > >There are a lot of hidden loops in here: > >1. '' in L >This will look at every element of L, until it finds "" or it gets to >the end. >2. L.count(word) >This will also look at every element of L. > >If you combine your loops into one, you should be able to save a lot of >time. > >eg: > >for e in saveRemovedForLaterL: > L.append(e) > >counts = {} >for word in L: > if not word: # This skips empty words. > continue > try: > counts[word] += 1 > except KeyError: > counts[word] = 1 >F = [(count, word) for word, count in counts.iteritems()] Things there I don't understand yet, I'm afraid. But I'll get to them. Thanks for pushing me, John. Dick _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor