further i don't have any idea about the structure of nucht ;)
but never the less. if there is no other way, i'll give it a try...
so for the moment, two more questions:
1. is there somewhere a howto, how to start writing plugins or are there just the api docs?
2. the java source of the language identifier plugin you are talking about is it located at: "trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang" isn't it?
thanks, tom
You can write a own index filter plugin & query filter and add a meta data to the index to identify the "start urls".
Take a look to the language identifier to get an idea.
Stefan
Am 12.04.2005 um 19:33 schrieb Tom Smets:
hello list,
i have a list of about 3000 urls which i want to crawl.
further i want to start a webcrawl with those urls as the initial fetchlist.
later i want to have the possibility to choose between a search over just the 3000 urls and a whole-web-search.
is it possible to use just one database (from whole-web-search) to get the desired results
or do i need to build to databases?
thanks, tom
--------------------------------------------------------------- company: http://www.media-style.com forum: http://www.text-mining.org blog: http://www.find23.net
