Hi:

Have you looked at the nutch-default.xml config file under
<name>searcher.dir</name> ??
You need to modify this to reflect DFS where your crawl directory is I
think you will have something like /user/nutch etc etc.. you can find
it by trying the following

bin/hadoop dfs

and

bin/hadoop dfs -ls

do you see anything there?? (Previously NDFS)

I am not sure this will help...

On 2/7/06, Bernd Fehling <[EMAIL PROTECTED]> wrote:
> For those of you who are also reinventing the wheel like me
> getting nutch-0.8-dev with MapReduce running on a single box
> here are some updates.
> This is about revision #374443.
>
> The DmozParser class mentioned in "quick tutorial for nutch
> 0.8 and later" seams to be in "org.apache.nutch.tools.DmozParser"
> and not "org.apache.nutch.crawl.DmozParser"
>
> Against all odd I managed to get a single web page fetched
> as the log from my web server tells and also the tasktracker
> log.
>
> Set all named properties in file nutch-default.xml containing
> the substring "verbose" to "true" to get more info from the
> log files.
>
> As far as I could figure out, there will be no index under
> "/tmp/nutch/mapred/local/index/" directory.
> It think it will be included in a file named "/tmp/nutch/ndfs/name/edits"
>
> The user interface is running and I keep the ROOT/WEB-INF/classes
> in sync with nutch/conf/ directory. The footer.html file
> is missing in each language directory. So copy it from e.g.
> include/footer.html to en/include/footer.html.
>
> What I didn't manage is getting access to the index from the
> user interface. How does the user interface know that I
> named my index "myindexTargetFolder" as in the tutorial?
> Mystery...
> Maybe a property to set somewhere...
>
> Regards,
> Bernd
>
>
> Bernd Fehling schrieb:
> > Went through the tutorial for nutch 0.8.
> > No further error messages.
> > All seams to run fine but where is the index?
> >
> > Used a single URL to start with but searching for
> > any term from that site gives no results.
> > I guess there is no index at all?
> >
> > Where to find a crawler log file?
> >
> > Bernd
> >
> > Stefan Groschupf schrieb:
> >
> >>> Is it just a simple text file with one URL per line?
> >>
> >>
> >> Yes
> >>
> >
> >
>


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to