I would like to write a web (http) proxy which I can instrument to 
automatically extract information from certain web sites as I browse 
them. Specifically, I would want to process URLs that match a particular 
regexp. For those URLs I would have code that parsed the content and 
logged some of it.

Think of it as web scraping under manual control.

I found this list of Python web proxies

http://www.xhaus.com/alan/python/proxies.html

Tiny HTTP Proxy in Python looks promising as it's nominally simple (not 
many lines of code)

http://www.okisoft.co.jp/esc/python/proxy/

It does what it's supposed to, but I'm a bit at a loss as where to 
intercept the traffic. I suspect it should be quite straightforward, but 
I'm finding the code a bit opaque.

Any suggestions?

Andrew
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to