Blocking a page in a url filter will also not fetch a page, so that doesn't solve your problem. You can remove the page manually from the index e.g. by using PruneIndexTool. However I have something here that also can solve the problem but I need some more time to prepare a patch.

Stefan

Am 21.01.2006 um 16:54 schrieb Franz Werfel:

Yes, that is an option we are certainly considering, but we would
rather have a start page and forget about it.
Cheers, Fr

On 1/20/06, Neal Whitley <[EMAIL PROTECTED]> wrote:
Franz,

Someone else will need to confirm this...

FYI...why not simply inject the urls directly into Nutch?

./nutch inject db/ -urlfile seeds.txt


At 03:49 PM 1/20/2006, you wrote:

Thank you, but if I do that will the page be read for urls?
Cheers, Frank

On 1/20/06, Neal Whitley <[EMAIL PROTECTED]> wrote:
Franz,

I 'think' you could use the regex url filter to not index this page
(regex-urlfilter.txt).

Something like:  -^http://([a-z0-9]*\.)*tripod.com/

I am new to Nutch so I make no guarantee... :-)

Neal



At 05:23 AM 1/20/2006, you wrote:

Hello,

We are trying to implement Nutch on an intranet and have setup a
special page which has links to all the other pages of the site, since
many are not linked together.
We will start with this special page and then go from there to all the other pages, but we would like to not index the first page (so that it
doesn't show up in search results), just use it for its links.
Is it possible easily?

Thank you.






---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply via email to