"Peter Hansen" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Alessandro Bottoni wrote: >> (Python has even been told to be used by Yahoo! and Google, among >> others, >> but nobody was able to demonstrate this, so far) > > Nobody, except Google's founders? > > http://www-db.stanford.edu/~backrub/google.html
I think the relevant paragraph is worth quoting here (****s added): " In order to scale to hundreds of millions of web pages, Google has a fast distributed crawling system. A single URLserver serves lists of URLs to a number of crawlers (we typically ran about 3). Both the URLserver and the crawlers are implemented in **Python**. Each crawler keeps roughly 300 connections open at once. This is necessary to retrieve web pages at a fast enough pace. At peak speeds, the system can crawl over 100 web pages per second using four crawlers. This amounts to roughly 600K per second of data. A major performance stress is DNS lookup. Each crawler maintains a its own DNS cache so it does not need to do a DNS lookup before crawling each document. Each of the hundreds of connections can be in a number of different states: looking up DNS, connecting to host, sending request, and receiving response. These factors make the crawler a complex component of the system. It uses asynchronous IO to manage events, and a number of queues to move page fetches from state to state. " This seems to have been about 2000. Of course, bottleneck code may have been rewritten in C, but Google continues to hire Python programmers (among others). Terry J. Reedy -- http://mail.python.org/mailman/listinfo/python-list