Kent Johnson wrote at 10:37 10/11/2005: >Kent Johnson wrote: > > Dick Moores wrote: > > > >> Kent Johnson wrote at 03:24 10/11/2005: > >> > >>> Dick Moores wrote: > >>> > >>>> (Execution took about 30 sec. with my computer.) > >>> > >>> > >>> That's way too long > >> > >> > >> > >> How long would you expect? I've already made some changes but haven't > >> seen the time change much. > > > > > > A couple of seconds at most, unless you are running it on some dog > > computer. It's just not that much text and you should be able to process > > it in a couple of passes at most. > >OK I couldn't resist. I took your program and ran it on my computer - >took about 38 seconds and got the same results as you. Then I made the >changes I outlined, and a few other similar ones, and got it down to 34 secs.
Yes, that's about the difference I was seeing. Thanks for taking the trouble. I went from 30 to 27. With no regex use (don't understand it yet). > Finally I made the change suggested by John Fouhy - to accumulate the > counts in a dict - and the time went down to 0.23 seconds. WOW! I didn't implement John's change because I didn't understand it. Haven't dealt with dictionaries yet. > >> Also, I'm puzzled that whether or not psyco is employed makes no > >> difference in the time. Can you explain why? > > > > > > My guess is it's because you have so many O(n^2) elements in the code. > > You have to get your algorithm to be O(n). OK, I finally bit the bullet, googled O(n^2), and read about Big O notation at <http://en.wikipedia.org/wiki/Big_O_notation> and <http://www.cprogramming.com/tutorial/computersciencetheory/algorithmicefficiency1.html> I think I've now got at least the basic idea. >In particular this code: >for word in L: > k = L.count(word) > if (k,word) not in F: > F.append((k,word)) > >L.count() has to scan through the entire list (L) looking for a match >with each word. So for each word, you are making 26140 string compares. >The total number of compares is 26140*26140 or 683,299,600. That's a lot! >Then, for each word, you scan F for a match. Now you are doing tuple >compares. The number of compares will increase as the length of F, but >overall it will be about 26140*3700/2 or 48,359,000 compares. Kent, you're beginning to get thru to me. Thanks for the details and the numbers. >Compare this to the dictionary version which just iterates L once, doing >a dictionary lookup and write for each word. > >The reason psyco doesn't make much difference is because all the time is >spent in list.count() which is already C code. Ah. But how can I know what is in C code and what isn't? For example, in a previous post you say that L.extend(e) is in C, and imply that L.append(e) isn't, and that therefore L.extend(e) should be used. Well, back to Hetlands' Beginning Python. Dick _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor