Hi Jaime,

Depending on what exactly you're trying to do, there are some other projects that offer crawler functionality which could be easier to embed.

The two I know about are:

- Droids (http://incubator.apache.org/droids/), though I haven't really used it. - Bixo (http://bixo.101tec.com/), which is a project I'm actively working on.

-- Ken

On Oct 1, 2009, at 9:37am, Jaime Martín wrote:

thank you for the info. that´s really a problem. I have a java project and
for some of its new features I would like to use nutch. As I need to
customise nutch my idea was next:
- 1st: change what needed for my requirements in my downloaded nutch and
generate a "nutch library"
- 2nd: add that library in the other project and invoke libraries features
when needed

is that not advisable? what is the best way then to generate a nutch library to be used in other java projects? or is that not possible without becoming
crazy due to configuration issues?



2009/10/1 Andrzej Bialecki <a...@getopt.org>

Jaime Martín wrote:

Hi!
I´ve a java application that I would like to "upgrade" with nutch. What
jars
should I add to my lib applicaction to make it possible to use nutch
features from some of my app pages and business logic classes?
I´ve tried with nutch-1.0.jar generated by "war" target without success. I wonder what is the proper nutch build.xml target I should execute for
this
and what of the generated jars are to be included in my app. Maybe apart from nutch-1.0.jar are all nutch-1.0\lib jars compulsory or just a few of
them?
thanks in advance!


Nutch is not designed for embedding in other applications, so you may face numerous problems. I did such an integration once, and it was far from obvious. A lot depends also whether you want to run it on a distributed
cluster or in a single JVM (local mode).

Take a look at build/nutch*.job, it's a jar file that contains all
dependencies needed to run Nutch except for Hadoop libraries (which are also
required).

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378

Reply via email to