I see Lewis has already created the Jira Issue. The proposed patch works
wonderfully, as expected.
Thanks guys, I registered into Jira and the wiki to comment on any future
Issues and experiences.
On 20 July 2012 14:55, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote:
Hi,
On Fri, Jul 20,
Hi,
This most certainly has something to do with data corruption. When all your
mappers succeed and the error is in reducer code (looks like your case), it
may be mapreduce intermediate output that cannot be read succesfully. Is
there enough free space on your configured mapred.local.dir devices?
Hi Joan,
On Mon, Jul 23, 2012 at 8:50 AM, Joan Espasa Arxer j...@anpro21.com wrote:
Thanks guys, I registered into Jira and the wiki to comment on any future
Issues and experiences.
Excellent, thanks for reporting initially anyway it was a good catch
which we obviously required in the
If I take certain crawl_generate input, and run fetch on it, this always
happens on the same url.
The url is a 300mb txt.
But other big size files run fine, without this problem. Also, if I open the
file myself with notepad++, its not corrupted, it opens fine.
I manged to debug it on the
@Lewis
Any update on the matter of the differences between Nutch 1.4 and 1.5 ?
@all
Any new insight on the question why the parse fails?
ShlomiJ
--
View this message in context:
http://lucene.472066.n3.nabble.com/RSS-parser-tp3719558p3996772.html
Sent from the Nutch - User mailing list
Hi,
On Mon, Jul 23, 2012 at 5:00 PM, ShlomiJ shlomij...@gmail.com wrote:
@Lewis
Any update on the matter of the differences between Nutch 1.4 and 1.5 ?
No difference between 1.4 1.5.1
Lewis
Hello Everyone,
I am using apache nutch to crawl HTTP websites. But when I try to crawl
HTTPS site its throwing the following error.
“failed with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found
for url=https:
I searched the forums and there were some suggestions to enable the
So did it fail before or after you used protocol-httpclient?
On 7/24/12, Kay uh.keer...@gmail.com wrote:
Hello Everyone,
I am using apache nutch to crawl HTTP websites. But when I try to crawl
HTTPS site its throwing the following error.
“failed with:
8 matches
Mail list logo