Hi Jaime,
Depending on what exactly you're trying to do, there are some other
projects that offer crawler functionality which could be easier to
embed.
The two I know about are:
- Droids (http://incubator.apache.org/droids/), though I haven't
really used it.
- Bixo (http://bixo.101tec.com/), which is a project I'm actively
working on.
-- Ken
On Oct 1, 2009, at 9:37am, Jaime Martín wrote:
thank you for the info. that´s really a problem. I have a java
project and
for some of its new features I would like to use nutch. As I need to
customise nutch my idea was next:
- 1st: change what needed for my requirements in my downloaded nutch
and
generate a "nutch library"
- 2nd: add that library in the other project and invoke libraries
features
when needed
is that not advisable? what is the best way then to generate a nutch
library
to be used in other java projects? or is that not possible without
becoming
crazy due to configuration issues?
2009/10/1 Andrzej Bialecki <a...@getopt.org>
Jaime Martín wrote:
Hi!
I´ve a java application that I would like to "upgrade" with nutch.
What
jars
should I add to my lib applicaction to make it possible to use nutch
features from some of my app pages and business logic classes?
I´ve tried with nutch-1.0.jar generated by "war" target without
success.
I wonder what is the proper nutch build.xml target I should
execute for
this
and what of the generated jars are to be included in my app. Maybe
apart
from nutch-1.0.jar are all nutch-1.0\lib jars compulsory or just a
few of
them?
thanks in advance!
Nutch is not designed for embedding in other applications, so you
may face
numerous problems. I did such an integration once, and it was far
from
obvious. A lot depends also whether you want to run it on a
distributed
cluster or in a single JVM (local mode).
Take a look at build/nutch*.job, it's a jar file that contains all
dependencies needed to run Nutch except for Hadoop libraries (which
are also
required).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378