On 2012-01-03, David Harks <d...@dwink.net> wrote: > On 1/3/2012 12:13 PM, Sean Wolfe wrote: >> if we are coding in python but looking for >> more performance, > > Are you in fact in this situation? Despite years of folks > mentioning how Python is 'slower than C++', I've seen a project > where a developer churned out a feature using Python's > generators that performed much faster than the C++ > implementation it replaced. It wasn't because the C++ was > slower by nature; it's because it was harder to express the > optimal algorithm in C++ so the C++ developer chose a > sub-optimal approach in the interest of meeting a deadline.
I was again looking through The Practice of Programming (Kernighan & Pike) today, and the chapter on the Markov chain program makes similar tradeoffs. For educational purposes they implement the program in several languages, and the C implemenation is fastest by far. The C++ implementation is surprisingly not close in performance to the C version, for two main reasons. 1. They used a std::deque to store the prefixes, when a std::list is the correct container. They point this out in a note on the poor performance of the C++ program, but do not note the irony that they had automatically used a singly-linked list in C. 2. When building the data structure in C they store only pointers to the original text, which they munge in memory to create zero-terminated strings. This is idiomatic C. But there's no reason C++ can't do a similar optimization. A std::list<char*> is certainly possible, but loses you most of the benefits of the standard containers. Best is probably to create a registry of words using a std::map<int, std::string>, with your lists storing references to words in the registry. You still have to copy of the text, but only once. The C++ implementation starts to smell sort of like Python. ;) -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list