Re: [Bug-wget] Async webcrawling

2018-08-01 Thread Darshit Shah
Hi James, Wget2 is built on top of the libwget library which uses Asynchronous network calls. However, Wget2 is written such that it only utilizes one connection per thread. This is essentially a design decision to simplify the codebase. In case you want a more complex crawler, you can use libwget

Re: [Bug-wget] Async webcrawling

2018-07-31 Thread Tim Rühsen
On 31.07.2018 20:17, James Read wrote: > Thanks, > > as I understand it though there is only so much you can do with > threading. For more scalable solutions you need to go with async > programming techniques. See http://www.kegel.com/c10k.html for a summary > of the problem. I want to do large sc

Re: [Bug-wget] Async webcrawling

2018-07-31 Thread James Read
Thanks, as I understand it though there is only so much you can do with threading. For more scalable solutions you need to go with async programming techniques. See http://www.kegel.com/c10k.html for a summary of the problem. I want to do large scale webcrawling and am not sure if wget2 is up to t

Re: [Bug-wget] Async webcrawling

2018-07-31 Thread Tim Rühsen
On 31.07.2018 18:39, James Read wrote: > Hi, > > how much work would it take to convert wget into a fully fledged > asynchronous webcrawler? > > I was thinking something like using select. Ideally, I want to be able to > supply wget with a list of starting point URLs and then for wget to crawl >

[Bug-wget] Async webcrawling

2018-07-31 Thread James Read
Hi, how much work would it take to convert wget into a fully fledged asynchronous webcrawler? I was thinking something like using select. Ideally, I want to be able to supply wget with a list of starting point URLs and then for wget to crawl the web from those starting points in an asynchronous f