Hi Martina

I knew that changes to the config files would not be picked up until a full rebuild of the project (or as a matter of fact, until I copy them over to the tmp_build directory.) What substantially contributes to my current headache is the fact, that suddenly I had ClassNotFoundExceptions and now the crawler ignores urls, that it did not ignore as of yesterday. I will definitely have to set my own SVN repo up, for the work on the plugin. Then I can tag versions that do run, just in case something goes bad again. Right now, the crawler is useless if started from Eclipse. If started from the command line, it works as inteded, however the plugin does not yet. I figure this is because the special fields I extract from the documents are not yet added as fields to the lucene documents.
But as long as Nutch is not debuggable, there is no telling...


Koch Martina wrote:
Hi Silvio,

I'm not sure, if I understood you correctly, but you might have a problem with 
the conf file locations.
When I run Nutch within Eclipse not the standard conf directory in <NUTCH_HOME>/conf 
is used, instead the conf files in the build directory (<NUTCH_HOME>/tmp_build, if 
you followed the instructions in the wiki) are used. The conf files in the tmp_build 
directory are a copy of the conf directory and are copied during build. If you made changes 
to the files in conf, these are not considered, if you didn't do a clean and build 
afterwards.

Hope this helps.

Martina


-----Ursprüngliche Nachricht-----
Von: Silvio Heuberger [mailto:[EMAIL PROTECTED] Gesendet: 02 December 2008 11:17
An: [email protected]
Betreff: Re: Ideal development environment for nutch plugins?

Let me clarify this:
The behavious of running the Crawl class from within eclipse is radically different, than from command line.

Silvio Heuberger wrote:
OK, here's supposedly an easy one:

What is the ideal setup to develop nutch plugins. I'm an eclipse-lover, so I set myself up with a checkout of the 0.9-branch. The eclipse project is set-up fine and I have been able to use the debugger to trace where the code goes and where it doesn't. I then ran the ant build from console. Now I'm getting sorta random behaviour. Sometimes URLNormalizer cannot be found. I fix that by readjusting the plugins directory. Anyhow, what's with the nutch-default.xml and nutch-site.xml? I figured nutch-site.xml overrides default, but I think that is not the case right now.

After running the ant build, running inside Eclipse b0rks with several errors. URLs are not picked up anymore an so on... So how do I have to set things up, to enable small incremental iterations of the form:

- Write skeleton for plugin (configs + code)
- write UnitTest
- add URLs that should use the plugin
- debug the code in eclipse
- use nutch-bean to verify the results (not sure about this. Is there a better way??)
- use ant to deploy.
- Start over again.

Thanks in advance.


Reply via email to