Re: Nutch 2.0 release and MySQL

2012-07-23 Thread Joan Espasa Arxer
I see Lewis has already created the Jira Issue. The proposed patch works wonderfully, as expected. Thanks guys, I registered into Jira and the wiki to comment on any future Issues and experiences. On 20 July 2012 14:55, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote: Hi, On Fri, Jul 20,

Re: NegativeArraySizeException and problem advancing port rec# during fetching

2012-07-23 Thread Ferdy Galema
Hi, This most certainly has something to do with data corruption. When all your mappers succeed and the error is in reducer code (looks like your case), it may be mapreduce intermediate output that cannot be read succesfully. Is there enough free space on your configured mapred.local.dir devices?

Re: Nutch 2.0 release and MySQL

2012-07-23 Thread Lewis John Mcgibbney
Hi Joan, On Mon, Jul 23, 2012 at 8:50 AM, Joan Espasa Arxer j...@anpro21.com wrote: Thanks guys, I registered into Jira and the wiki to comment on any future Issues and experiences. Excellent, thanks for reporting initially anyway it was a good catch which we obviously required in the

Re: NegativeArraySizeException and problem advancing port rec# during fetching

2012-07-23 Thread nutch.bu...@gmail.com
If I take certain crawl_generate input, and run fetch on it, this always happens on the same url. The url is a 300mb txt. But other big size files run fine, without this problem. Also, if I open the file myself with notepad++, its not corrupted, it opens fine. I manged to debug it on the

Re: RSS parser

2012-07-23 Thread ShlomiJ
@Lewis Any update on the matter of the differences between Nutch 1.4 and 1.5 ? @all Any new insight on the question why the parse fails? ShlomiJ -- View this message in context: http://lucene.472066.n3.nabble.com/RSS-parser-tp3719558p3996772.html Sent from the Nutch - User mailing list

Re: RSS parser

2012-07-23 Thread Lewis John Mcgibbney
Hi, On Mon, Jul 23, 2012 at 5:00 PM, ShlomiJ shlomij...@gmail.com wrote: @Lewis Any update on the matter of the differences between Nutch 1.4 and 1.5 ? No difference between 1.4 1.5.1 Lewis

Crawl HTTPS websites/Enable Plugin

2012-07-23 Thread Kay
Hello Everyone, I am using apache nutch to crawl HTTP websites. But when I try to crawl HTTPS site its throwing the following error. “failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found for url=https: I searched the forums and there were some suggestions to enable the

Re: Crawl HTTPS websites/Enable Plugin

2012-07-23 Thread remi tassing
So did it fail before or after you used protocol-httpclient? On 7/24/12, Kay uh.keer...@gmail.com wrote: Hello Everyone, I am using apache nutch to crawl HTTP websites. But when I try to crawl HTTPS site its throwing the following error. “failed with: