Re: problems http-client

2006-01-06 Thread Jérôme Charron
A related issue is that these two plugins replicate a lot of code. At some point we should try to fix that. See: http://www.nabble.com/protocol-http-versus-protocol-httpclient-t521282.html I have beginning working on this. Nobody else? Can I go on? Jérôme -- http://motrech.free.fr/

Re: problems http-client

2006-01-06 Thread Andrzej Bialecki
Jérôme Charron wrote: A related issue is that these two plugins replicate a lot of code. At some point we should try to fix that. See: http://www.nabble.com/protocol-http-versus-protocol-httpclient-t521282.html I have beginning working on this. Nobody else? Can I go on?

Re: problems http-client

2006-01-06 Thread AJ Chen
I have started to see this problem recently. topN=20 per crawl, but fetched pages = 15 - 17, while error pages = 2000 - 5000. 25000 pages are missing. this is reproducible with nutch0.7.1, both protocol-http and protocol-httpclient are included. I also see lots of Response content

Re: problems http-client

2006-01-05 Thread Doug Cutting
Andrzej Bialecki wrote: Hmm... I'm not saying it's flawless, there were surely some mysterious things going on with it. That large crawl you mention, was it with the (recently updated in Nutch) release 3.0? What were the issues? No, it was in early December, with the previous version. I

Re: problems http-client

2005-12-19 Thread Andrzej Bialecki
Stefan Groschupf wrote: Anyway today we note that when fetching with http-client the sum of errors and fetched pages is much less than the size defined when generating the segment. Changing to protocol-http solves the problem. Has anyone also note this behavior? I haven't, but this

Re: problems http-client

2005-12-19 Thread Stefan Groschupf
OK I will do that tomorrow! However in case it is known as buggy, we may should not set up as default http protocol plugin as it is by today. Newbies checking out nutch ill use the version that does not fetch all pages, since most people start with the standard configuration. Am 19.12.2005

Re: problems http-client

2005-12-19 Thread Michael
The same problem on FreeBSD 6.0 + jdk1.4.2 I think it was also reported some time ago by Rod Taylor. Switch to protocol-http. SG Hi there, SG is there someone out there that can confirm a problem we discovered? SG We was wondering why not all pages of a generated segments was SG fetched.

Re: problems http-client

2005-12-19 Thread Andrzej Bialecki
Stefan Groschupf wrote: OK I will do that tomorrow! However in case it is known as buggy, we may should not set up as default http protocol plugin as it is by today. Newbies checking out nutch ill use the version that does not fetch all pages, since most people start with the standard