For those of you who are also reinventing the wheel like me
getting nutch-0.8-dev with MapReduce running on a single box
here are some updates.
This is about revision #374443.

The DmozParser class mentioned in "quick tutorial for nutch
0.8 and later" seams to be in "org.apache.nutch.tools.DmozParser"
and not "org.apache.nutch.crawl.DmozParser"

Against all odd I managed to get a single web page fetched
as the log from my web server tells and also the tasktracker
log.

Set all named properties in file nutch-default.xml containing
the substring "verbose" to "true" to get more info from the
log files.

As far as I could figure out, there will be no index under
"/tmp/nutch/mapred/local/index/" directory.
It think it will be included in a file named "/tmp/nutch/ndfs/name/edits"

The user interface is running and I keep the ROOT/WEB-INF/classes
in sync with nutch/conf/ directory. The footer.html file
is missing in each language directory. So copy it from e.g.
include/footer.html to en/include/footer.html.

What I didn't manage is getting access to the index from the
user interface. How does the user interface know that I
named my index "myindexTargetFolder" as in the tutorial?
Mystery...
Maybe a property to set somewhere...

Regards,
Bernd


Bernd Fehling schrieb:
Went through the tutorial for nutch 0.8.
No further error messages.
All seams to run fine but where is the index?

Used a single URL to start with but searching for
any term from that site gives no results.
I guess there is no index at all?

Where to find a crawler log file?

Bernd

Stefan Groschupf schrieb:

Is it just a simple text file with one URL per line?


Yes




Reply via email to