when you get an error while fetching, and you get the
org.apache.nutch.protocol.retrylater because the max retries have been
reached, nutch says it has given up and will retry later, when does that
retry occur?
That's an issue I reported some weeks ago and which is in my opinion
an annoying
On 3/4/06, Stefan Groschupf:
Just a general note, jira has a voting functionality.
This allows everybody to vote an issue and can show in a very
compressed style what the community is looking for.
However it is not used that often yet. It would be great if more
users can use it.
That's a
On 3/4/06, Stefan Groschupf:
Just a general note, jira has a voting functionality.
This allows everybody to vote an issue and can show in a very
compressed style what the community is looking for.
However it is not used that often yet. It would be great if more
users can use it.
That's a
Try to increase the value for the parameter of
property
namefetcher.threads.per.host/name
value1/value
/property
This could help if you crawl pages from one host and if you run into time-outs.
By the way:
It's important to avoid time-outs because in Nutch 0.7.1 there is a bug that
prevents
to generate it (e.g. use the apache log-file)
- Enhance the nutch html parser and make it able to intepret the JavaScipt links
Greetings
mos, from munich
On 2/3/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Hello,
I have problems indexing a special internet site:
http://www.gildemeister.com
);
tool.updateForSegment(fileSystem, lseg);
tool.close();
Thanks
mos