For those of you who are also reinventing the wheel like me
getting nutch-0.8-dev with MapReduce running on a single box
here are some updates.
This is about revision #374443.
The DmozParser class mentioned in "quick tutorial for nutch
0.8 and later" seams to be in "org.apache.nutch.tools.DmozParser"
and not "org.apache.nutch.crawl.DmozParser"
Against all odd I managed to get a single web page fetched
as the log from my web server tells and also the tasktracker
log.
Set all named properties in file nutch-default.xml containing
the substring "verbose" to "true" to get more info from the
log files.
As far as I could figure out, there will be no index under
"/tmp/nutch/mapred/local/index/" directory.
It think it will be included in a file named "/tmp/nutch/ndfs/name/edits"
The user interface is running and I keep the ROOT/WEB-INF/classes
in sync with nutch/conf/ directory. The footer.html file
is missing in each language directory. So copy it from e.g.
include/footer.html to en/include/footer.html.
What I didn't manage is getting access to the index from the
user interface. How does the user interface know that I
named my index "myindexTargetFolder" as in the tutorial?
Mystery...
Maybe a property to set somewhere...
Regards,
Bernd
Bernd Fehling schrieb:
Went through the tutorial for nutch 0.8.
No further error messages.
All seams to run fine but where is the index?
Used a single URL to start with but searching for
any term from that site gives no results.
I guess there is no index at all?
Where to find a crawler log file?
Bernd
Stefan Groschupf schrieb:
Is it just a simple text file with one URL per line?
Yes