Looks like I'm out on my own here, so for anyone else who wants to set
up a minimal Nutch in a web-app to just use as a site search tool here's
what I did as a first step.
- stripped the config files down the minimum required for a web-app:
nutch-site.xml and common-terms.utf8, both in WEB-INF/classes so that
they get picked up _before_ the contents of nutch-0.9.jar
- In WEB-INF/classes/nutch-site.xml set up the searcher.dir and plugin
directory as hard-coded paths to the Nutch installation directory (this
needs to change - more in subsequent postings)
<property>
<name>searcher.dir</name>
<value>c:/nutch-0.9/crawl.mysite</value>
...
<property>
<name>plugin.folders</name>
<value>c:/nutch-0.9/plugins</value>
...
- Add required jars to WEB-INF/lib: I needed nutch-09, lucene-core,
lucene-misc and hadoop jar files. (My app already contains a bunch of
other jars that are probably needed, but adding those listed got it
working).
Now I can initialize NutchBean in my viewController (yes _ I'm working
under JSF in Spring) and use it likes this..
conf = NutchConfiguration.create();
bean = new NutchBean(conf);
And it picks up my index and runs searches. However It's still loading
much more than I need - the logs show a huge set of plugins being
picked up from the plugin directory. More about that in another e-mail
to keep things clear.
Cheers,
Ian.
> -----Original Message-----
> From: Ian.Priest [mailto:[EMAIL PROTECTED]
> Sent: 08 May 2007 14:42
> To: [EMAIL PROTECTED]
> Subject: Newbie hello and web-setup question
>
> Hi all,
>
>
>
> I'm just starting out using Nutch and have a (probably basic) question
> about configuration.
>
>
>
> I want to use Nutch to provide search facilities for a website. I have
> installed it at c:/nutch-0.9, edited the files in c:/nuthc-0.9/conf
> and
> created a crawl index at c:/nutch-0.9/crawl.mysite.
>
>
>
> Now I'm trying to use NutchBean to return some search results to a
> page
> on my site. I'm using Tomcat and have added nutch-0.9.jar to the
> deployed war file. However, I'm having trouble getting Nutch to run
> against the external-to-tomcat directory.
>
> Specifically, I don't want to have copies of the Nutch config files in
> the deployed webapp to avoid management overhead on the live site
> associated with keeping two copies of the file in sync, so I want to
> convince the webapp to use c:/nutch-0.9/conf to get nutch-site.xml and
> so forth.
>
>
>
> I can get most of the config loaded by loading the xml files into a
> configuration object like this...
>
>
>
> //conf = NutchConfiguration.create();
>
> conf = new Configuration();
>
>
>
> // Add Nutch config files using nutchPath as a base to
> over-ride defaults
>
> File defaultFile = new File(nutchPath +
> "/conf/nutch-default.xml");
>
> if ( defaultFile.exists() ) {
>
> conf.addDefaultResource(defaultFile.toURL());
>
> }
>
> File siteFile = new File(nutchPath +
> "/conf/nutch-site.xml");
>
> if ( siteFile.exists() ) {
>
> conf.addFinalResource(siteFile.toURL());
>
> }
>
>
>
> bean = new NutchBean(conf);
>
>
>
> but for some reason it won't pick up a file called common-terms.utf8.
> It
> just reports:
>
>
>
> 2007-05-08 13:50:14,318 [main] INFO
> [org.apache.hadoop.conf.Configuration]
> C:/nutch-0.9/conf/common-terms.utf8 not found
>
>
>
> Although that file is certainly present. And then it throws an NPE
> because it can't find the file:
>
>
>
> java.lang.NullPointerException
>
> at java.io.Reader.<init>(Reader.java:61)
>
> at
> java.io.BufferedReader.<init>(BufferedReader.java:76)
>
> ...
>
>
>
> Anyone know where I'm going wrong and how I can configure it so I
don't
> have to include the config files in my war?
>
>
>
> Cheers,
>
> Ian.
>
>
>
>
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general