Hi Martina
I knew that changes to the config files would not be picked up until a
full rebuild of the project (or as a matter of fact, until I copy them
over to the tmp_build directory.)
What substantially contributes to my current headache is the fact, that
suddenly I had ClassNotFoundExceptions and now the crawler ignores urls,
that it did not ignore as of yesterday.
I will definitely have to set my own SVN repo up, for the work on the
plugin. Then I can tag versions that do run, just in case something goes
bad again. Right now, the crawler is useless if started from Eclipse. If
started from the command line, it works as inteded, however the plugin
does not yet. I figure this is because the special fields I extract from
the documents are not yet added as fields to the lucene documents.
But as long as Nutch is not debuggable, there is no telling...
Koch Martina wrote:
Hi Silvio,
I'm not sure, if I understood you correctly, but you might have a problem with
the conf file locations.
When I run Nutch within Eclipse not the standard conf directory in <NUTCH_HOME>/conf
is used, instead the conf files in the build directory (<NUTCH_HOME>/tmp_build, if
you followed the instructions in the wiki) are used. The conf files in the tmp_build
directory are a copy of the conf directory and are copied during build. If you made changes
to the files in conf, these are not considered, if you didn't do a clean and build
afterwards.
Hope this helps.
Martina
-----Ursprüngliche Nachricht-----
Von: Silvio Heuberger [mailto:[EMAIL PROTECTED]
Gesendet: 02 December 2008 11:17
An: [email protected]
Betreff: Re: Ideal development environment for nutch plugins?
Let me clarify this:
The behavious of running the Crawl class from within eclipse is
radically different, than from command line.
Silvio Heuberger wrote:
OK, here's supposedly an easy one:
What is the ideal setup to develop nutch plugins. I'm an eclipse-lover,
so I set myself up with a checkout of the 0.9-branch.
The eclipse project is set-up fine and I have been able to use the
debugger to trace where the code goes and where it doesn't.
I then ran the ant build from console. Now I'm getting sorta random
behaviour. Sometimes URLNormalizer cannot be found. I fix that by
readjusting the plugins directory.
Anyhow, what's with the nutch-default.xml and nutch-site.xml? I figured
nutch-site.xml overrides default, but I think that is not the case right
now.
After running the ant build, running inside Eclipse b0rks with several
errors. URLs are not picked up anymore an so on...
So how do I have to set things up, to enable small incremental
iterations of the form:
- Write skeleton for plugin (configs + code)
- write UnitTest
- add URLs that should use the plugin
- debug the code in eclipse
- use nutch-bean to verify the results (not sure about this. Is there a
better way??)
- use ant to deploy.
- Start over again.
Thanks in advance.