Re: Repeatedly crawl website every 1 min

Iuri Thu, 11 May 2017 02:31:36 -0700

Unless you are authorized, don't do it. It literally costs a lot of money
to the website you are crawling, in CPU and bandwidth.


Hundreds of concurrent requests can even kill a small server (with bad
configuration).

Look scrapy package, it is great for scraping, but be friendly with the
websites you are crawling.

Em 10 de mai de 2017 23:22, <liyucun2...@gmail.com> escreveu:

> Hi Everyone,
>
> Thanks for stoping by. I am working on a feature to crawl website content
> every 1 min. I am curious to know if there any good open source project for
> this specific scenario.
>
> Specifically, I have many urls, and I want to maintain a thread pool so
> that each thread will repeatedly crawl content from the given url. It could
> be a hundreds thread at the same time.
>
> Your help is greatly appreciated.
>
> ;)
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Repeatedly crawl website every 1 min

Reply via email to