"Peter Hansen" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Alessandro Bottoni wrote:
>> (Python has even been told to be used by Yahoo! and Google, among 
>> others,
>> but nobody was able to demonstrate this, so far)
>
> Nobody, except Google's founders?
>
> http://www-db.stanford.edu/~backrub/google.html

I think the relevant paragraph is worth quoting here (****s added):
"
In order to scale to hundreds of millions of web pages, Google has a fast 
distributed crawling system. A single URLserver serves lists of URLs to a 
number of crawlers (we typically ran about 3). Both the URLserver and the 
crawlers are implemented in **Python**. Each crawler keeps roughly 300 
connections open at once. This is necessary to retrieve web pages at a fast 
enough pace. At peak speeds, the system can crawl over 100 web pages per 
second using four crawlers. This amounts to roughly 600K per second of 
data. A major performance stress is DNS lookup. Each crawler maintains a 
its own DNS cache so it does not need to do a DNS lookup before crawling 
each document. Each of the hundreds of connections can be in a number of 
different states: looking up DNS, connecting to host, sending request, and 
receiving response. These factors make the crawler a complex component of 
the system. It uses asynchronous IO to manage events, and a number of 
queues to move page fetches from state to state.
"
This seems to have been about 2000.  Of course, bottleneck code may have 
been rewritten in C, but Google continues to hire Python programmers (among 
others).

Terry J. Reedy



-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to