nutch crawl issue

2010-04-27 Thread matthew a. grisius
using Nutch nightly build nutch-2010-04-27_04-00-28: I am trying to bin/nutch crawl a single html file generated by javadoc and no links are followed. I verified this with bin/nutch readdb and bin/nutch readlinkdb, and also with luke-1.0.1. Only the single base seed doc specified is processed. I

Issues in recrawling

2010-04-27 Thread arpit khurdiya
HI, I m new to the world of nutch. I am trying to crawl  local file systems on LAN using nutch 1.0. Documents are rarely modified and then search them using solr. And frequency of recrawling is 1 day as documents are frequently added and deleted. I have few queries regarding recrawling. 1. What i

Problem while updating crawldb from segments directory

2010-04-27 Thread hareesh
Hi, I have posted the same query few weeks back, sorry for asking it again. Iam having problem while updating the crawleddb from the crawled segments.when i try to run the command its talking too long at after 1200 seconds updating to crawldb is getting failed, so iam unable to do incremental cra

Re: Hadoop Disk Error

2010-04-27 Thread Andrzej Bialecki
On 2010-04-26 22:31, Joshua J Pavel wrote: > > Sending this out to close the thread if anyone else experiences this > problem: nutch 1.0 is not AIX-friendly (0.9 is). > > I'm not 100% sure which command it may be, but by modifying my path so > that /opt/freeware/bin has precedence, I no longer ge