Shane Hathaway wrote:
Thanks, but the credit goes to the people who have optimized Python dictionaries so well. Sometimes I wish such optimized code were available in plain C/C++.

There's still more fine tuning possible. I've attached a version that shaves another second from the kjv100 test.

Then, install psyco (http://psyco.sourceforge.net/) and run the attached script. I measure 4.9 seconds, which could put the Python + Psyco version in 1st place, at least for a while. :-)

Shane
#!/usr/bin/python2.4

import sys

def main():
    words_fname = 'words.i'
    if len(sys.argv) > 1:
        source = open(sys.argv[1])
    else:
        source = sys.stdin

    words = {}
    for line in open(words_fname):
        words[line.rstrip()] = 0

    freq = {}
    for line in source:
        for word in line.split():
            if word in freq:
                freq[word] += 1
            elif word in words:
                freq[word] = 1

    for word, count in freq.iteritems():
        print word, count

import psyco
psyco.bind(main)

main()
/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to