Sorry. I'm using 2.1. I did a general web search and didn't find any
instances of the problem. I found a couple tutorials using the
file:///data/mydir format with no mention of any issues.
The problem is that the normalizers(not sure which one) strip out that
leading / which changes the url
Finally found it in JIRA.
https://issues.apache.org/jira/browse/NUTCH-1483
I'll give the patch a try and see if that fixes my issue.
On Wed, Mar 27, 2013 at 4:29 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Nutch version please?
Sebastian and others worked on this a while ago.
Hi, I got a really embarrassing question. After googling for this answer
for some time, I can't find out how to set the politeness level when I
crawl through a site. I don't want to bombard a site. Any thoughts or
pointers on how to do this?
I was able to look into ${APACHE_NUTCH_HOME}/conf/nutch-default.xml and it
listed a very good explanation of each term that I can use to throttle my
crawling. I should be all set for now unless there's something that I'm
seriously not getting.
-- Forwarded message --
From: Yves
Please also see
https://issues.apache.org/jira/browse/NUTCH-1484
Sebastien resolved this one off and AFAIK fixed the solution.
On Thu, Mar 28, 2013 at 6:09 AM, Bai Shen baishen.li...@gmail.com wrote:
Finally found it in JIRA.
https://issues.apache.org/jira/browse/NUTCH-1483
I'll give the
Hi everyone
anybody has any idea why i am getting this error when i run generate
right after i inject to a new crawlId in local mode (that is not to say
that this doesn't happen in deploy mode or on a preexisting crawlID, i
just haven't test those)
2013-03-28 11:06:21,911 INFO
6 matches
Mail list logo