For those of you who are also reinventing the wheel like me
getting nutch-0.8-dev with MapReduce running on a single box
here are some updates.
This is about revision #374443.
The DmozParser class mentioned in "quick tutorial for nutch
0.8 and later" seams to be in "org.apache.nutch.tools.DmozParser"
and not "org.apache.nutch.crawl.DmozParser"
Against all odd I managed to get a single web page fetched
as the log from my web server tells and also the tasktracker
log.
Set all named properties in file nutch-default.xml containing
the substring "verbose" to "true" to get more info from the
log files.
As far as I could figure out, there will be no index under
"/tmp/nutch/mapred/local/index/" directory.
It think it will be included in a file named "/tmp/nutch/ndfs/name/edits"
The user interface is running and I keep the ROOT/WEB-INF/classes
in sync with nutch/conf/ directory. The footer.html file
is missing in each language directory. So copy it from e.g.
include/footer.html to en/include/footer.html.
What I didn't manage is getting access to the index from the
user interface. How does the user interface know that I
named my index "myindexTargetFolder" as in the tutorial?
Mystery...
Maybe a property to set somewhere...
Regards,
Bernd
Bernd Fehling schrieb:
Went through the tutorial for nutch 0.8.
No further error messages.
All seams to run fine but where is the index?
Used a single URL to start with but searching for
any term from that site gives no results.
I guess there is no index at all?
Where to find a crawler log file?
Bernd
Stefan Groschupf schrieb:
Is it just a simple text file with one URL per line?
Yes
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general