Kent Johnson wrote: > Dick Moores wrote: > >> Kent Johnson wrote at 03:24 10/11/2005: >> >>> Dick Moores wrote: >>> >>>> (Execution took about 30 sec. with my computer.) >>> >>> >>> That's way too long >> >> >> >> How long would you expect? I've already made some changes but haven't >> seen the time change much. > > > A couple of seconds at most, unless you are running it on some dog > computer. It's just not that much text and you should be able to process > it in a couple of passes at most.
OK I couldn't resist. I took your program and ran it on my computer - took about 38 seconds and got the same results as you. Then I made the changes I outlined, and a few other similar ones, and got it down to 34 secs. Finally I made the change suggested by John Fouhy - to accumulate the counts in a dict - and the time went down to 0.23 seconds. >> Also, I'm puzzled that whether or not psyco is employed makes no >> difference in the time. Can you explain why? > > > My guess is it's because you have so many O(n^2) elements in the code. > You have to get your algorithm to be O(n). In particular this code: for word in L: k = L.count(word) if (k,word) not in F: F.append((k,word)) L.count() has to scan through the entire list (L) looking for a match with each word. So for each word, you are making 26140 string compares. The total number of compares is 26140*26140 or 683,299,600. That's a lot! Then, for each word, you scan F for a match. Now you are doing tuple compares. The number of compares will increase as the length of F, but overall it will be about 26140*3700/2 or 48,359,000 compares. Compare this to the dictionary version which just iterates L once, doing a dictionary lookup and write for each word. The reason psyco doesn't make much difference is because all the time is spent in list.count() which is already C code. Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor