ok sounds quite easy. the only problem is that i never wrote anything in java and
further i don't have any idea about the structure of nucht ;)


but never the less. if there is no other way,
i'll give it a try...

so for the moment, two more questions:

1. is there somewhere a howto, how to start writing plugins
or are there just the api docs?

2. the java source of the language identifier plugin you are talking about
is it located at:
"trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang"
isn't it?


thanks, tom


You can write a own index filter plugin & query filter and add a meta data to the index to identify the "start urls".
Take a look to the language identifier to get an idea.


Stefan

Am 12.04.2005 um 19:33 schrieb Tom Smets:

hello list,
i have a list of about 3000 urls which i want to crawl.
further i want to start a webcrawl with those urls as the initial fetchlist.


later i want to have the possibility to choose between a search over just the 3000 urls and a whole-web-search.

is it possible to use just one database (from whole-web-search) to get the desired results
or do i need to build to databases?


thanks,
tom


---------------------------------------------------------------
company:                http://www.media-style.com
forum:          http://www.text-mining.org
blog:                   http://www.find23.net





Reply via email to