Re: AW: Ideal development environment for nutch plugins?

Silvio Heuberger Tue, 02 Dec 2008 04:28:22 -0800

Hi Martina

I knew that changes to the config files would not be picked up until afull rebuild of the project (or as a matter of fact, until I copy themover to the tmp_build directory.)What substantially contributes to my current headache is the fact, thatsuddenly I had ClassNotFoundExceptions and now the crawler ignores urls,that it did not ignore as of yesterday.I will definitely have to set my own SVN repo up, for the work on theplugin. Then I can tag versions that do run, just in case something goesbad again. Right now, the crawler is useless if started from Eclipse. Ifstarted from the command line, it works as inteded, however the plugindoes not yet. I figure this is because the special fields I extract fromthe documents are not yet added as fields to the lucene documents.

But as long as Nutch is not debuggable, there is no telling...



Koch Martina wrote:

Hi Silvio,

I'm not sure, if I understood you correctly, but you might have a problem with 
the conf file locations.
When I run Nutch within Eclipse not the standard conf directory in <NUTCH_HOME>/conf 
is used, instead the conf files in the build directory (<NUTCH_HOME>/tmp_build, if 
you followed the instructions in the wiki) are used. The conf files in the tmp_build 
directory are a copy of the conf directory and are copied during build. If you made changes 
to the files in conf, these are not considered, if you didn't do a clean and build 
afterwards.

Hope this helps.

Martina


-----Ursprüngliche Nachricht-----
Von: Silvio Heuberger [mailto:[EMAIL PROTECTED]Gesendet: 02 December 2008 11:17
An: [email protected]
Betreff: Re: Ideal development environment for nutch plugins?

Let me clarify this:
The behavious of running the Crawl class from within eclipse isradically different, than from command line.
Silvio Heuberger wrote:
OK, here's supposedly an easy one:
What is the ideal setup to develop nutch plugins. I'm an eclipse-lover,so I set myself up with a checkout of the 0.9-branch.The eclipse project is set-up fine and I have been able to use thedebugger to trace where the code goes and where it doesn't.I then ran the ant build from console. Now I'm getting sorta randombehaviour. Sometimes URLNormalizer cannot be found. I fix that byreadjusting the plugins directory.Anyhow, what's with the nutch-default.xml and nutch-site.xml? I figurednutch-site.xml overrides default, but I think that is not the case rightnow.
After running the ant build, running inside Eclipse b0rks with severalerrors. URLs are not picked up anymore an so on...So how do I have to set things up, to enable small incrementaliterations of the form:
- Write skeleton for plugin (configs + code)
- write UnitTest
- add URLs that should use the plugin
- debug the code in eclipse
- use nutch-bean to verify the results (not sure about this. Is there abetter way??)
- use ant to deploy.
- Start over again.

Thanks in advance.

Re: AW: Ideal development environment for nutch plugins?

Reply via email to