using Nutch nightly build nutch-2010-04-27_04-00-28:
I am trying to bin/nutch crawl a single html file generated by javadoc
and no links are followed. I verified this with bin/nutch readdb and
bin/nutch readlinkdb, and also with luke-1.0.1. Only the single base
seed doc specified is processed.
I
HI,
I m new to the world of nutch. I am trying to crawl local file
systems on LAN using nutch 1.0. Documents are rarely modified and then
search them using solr. And frequency of recrawling is 1 day as
documents are frequently added and deleted. I have few queries
regarding recrawling.
1. What i
Hi,
I have posted the same query few weeks back, sorry for asking it again.
Iam having problem while updating the crawleddb from the crawled
segments.when i try to run the command its talking too long at after 1200
seconds updating to crawldb is getting failed, so iam unable to do
incremental cra
On 2010-04-26 22:31, Joshua J Pavel wrote:
>
> Sending this out to close the thread if anyone else experiences this
> problem: nutch 1.0 is not AIX-friendly (0.9 is).
>
> I'm not 100% sure which command it may be, but by modifying my path so
> that /opt/freeware/bin has precedence, I no longer ge