Looks like I'm out on my own here, so for anyone else who wants to set
up a minimal Nutch in a web-app to just use as a site search tool here's
what I did as a first step.

 - stripped the config files down the minimum required for a web-app:
nutch-site.xml and common-terms.utf8, both in WEB-INF/classes so that
they get picked up _before_ the contents of nutch-0.9.jar

 - In WEB-INF/classes/nutch-site.xml set up the searcher.dir and plugin
directory as hard-coded paths to the Nutch installation directory (this
needs to change - more in subsequent postings)
        <property>
                <name>searcher.dir</name>
                <value>c:/nutch-0.9/crawl.mysite</value>
        ...
        <property>
                <name>plugin.folders</name>
                <value>c:/nutch-0.9/plugins</value>
        ...

- Add required jars to WEB-INF/lib: I needed nutch-09, lucene-core,
lucene-misc and hadoop jar files. (My app already contains a bunch of
other jars that are probably needed, but adding those listed got it
working).

Now I can initialize NutchBean in my viewController (yes _ I'm working
under JSF in Spring) and use it likes this..

            conf = NutchConfiguration.create();
            bean = new NutchBean(conf);

And it picks up my index and runs searches. However It's still loading
much more than I need -  the logs show a huge set of plugins being
picked up from the plugin directory. More about that in another e-mail
to keep things clear.

Cheers,
Ian.



> -----Original Message-----
> From: Ian.Priest [mailto:[EMAIL PROTECTED]
> Sent: 08 May 2007 14:42
> To: [EMAIL PROTECTED]
> Subject: Newbie hello and web-setup question
> 
> Hi all,
> 
> 
> 
> I'm just starting out using Nutch and have a (probably basic) question
> about configuration.
> 
> 
> 
> I want to use Nutch to provide search facilities for a website. I have
> installed it at c:/nutch-0.9, edited the files in c:/nuthc-0.9/conf
> and
> created a crawl index at c:/nutch-0.9/crawl.mysite.
> 
> 
> 
> Now I'm trying to use NutchBean to  return some search results to a
> page
> on my site. I'm using Tomcat and have added nutch-0.9.jar to the
> deployed war file. However, I'm having trouble getting Nutch to run
> against the external-to-tomcat directory.
> 
> Specifically, I don't want to have copies of the Nutch config files in
> the deployed webapp to avoid management overhead on the live site
> associated with keeping two copies of the file in sync, so I want to
> convince the webapp to use c:/nutch-0.9/conf to get nutch-site.xml and
> so forth.
> 
> 
> 
> I can get most of the config loaded by loading the xml files into a
> configuration object like this...
> 
> 
> 
>           //conf = NutchConfiguration.create();
> 
>             conf = new Configuration();
> 
> 
> 
>           // Add Nutch config files using nutchPath as a base to
> over-ride defaults
> 
>             File defaultFile = new File(nutchPath +
> "/conf/nutch-default.xml");
> 
>             if ( defaultFile.exists() ) {
> 
>                   conf.addDefaultResource(defaultFile.toURL());
> 
>             }
> 
>             File siteFile = new File(nutchPath +
> "/conf/nutch-site.xml");
> 
>             if ( siteFile.exists() ) {
> 
>                   conf.addFinalResource(siteFile.toURL());
> 
>             }
> 
> 
> 
>           bean = new NutchBean(conf);
> 
> 
> 
> but for some reason it won't pick up a file called common-terms.utf8.
> It
> just reports:
> 
> 
> 
> 2007-05-08 13:50:14,318 [main] INFO
> [org.apache.hadoop.conf.Configuration]
> C:/nutch-0.9/conf/common-terms.utf8 not found
> 
> 
> 
> Although that file is certainly present. And then it throws an NPE
> because it can't find the file:
> 
> 
> 
> java.lang.NullPointerException
> 
>                 at java.io.Reader.<init>(Reader.java:61)
> 
>                 at
> java.io.BufferedReader.<init>(BufferedReader.java:76)
> 
> ...
> 
> 
> 
> Anyone know where I'm going wrong and how I can configure it so I
don't
> have to include the config files in my war?
> 
> 
> 
> Cheers,
> 
> Ian.
> 
> 
> 
> 
> 
> 



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to