Unless you are authorized, don't do it. It literally costs a lot of money to the website you are crawling, in CPU and bandwidth.
Hundreds of concurrent requests can even kill a small server (with bad configuration). Look scrapy package, it is great for scraping, but be friendly with the websites you are crawling. Em 10 de mai de 2017 23:22, <liyucun2...@gmail.com> escreveu: > Hi Everyone, > > Thanks for stoping by. I am working on a feature to crawl website content > every 1 min. I am curious to know if there any good open source project for > this specific scenario. > > Specifically, I have many urls, and I want to maintain a thread pool so > that each thread will repeatedly crawl content from the given url. It could > be a hundreds thread at the same time. > > Your help is greatly appreciated. > > ;) > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list