Blocking a page in a url filter will also not fetch a page, so that
doesn't solve your problem.
You can remove the page manually from the index e.g. by using
PruneIndexTool.
However I have something here that also can solve the problem but I
need some more time to prepare a patch.
Stefan
Am 21.01.2006 um 16:54 schrieb Franz Werfel:
Yes, that is an option we are certainly considering, but we would
rather have a start page and forget about it.
Cheers, Fr
On 1/20/06, Neal Whitley <[EMAIL PROTECTED]> wrote:
Franz,
Someone else will need to confirm this...
FYI...why not simply inject the urls directly into Nutch?
./nutch inject db/ -urlfile seeds.txt
At 03:49 PM 1/20/2006, you wrote:
Thank you, but if I do that will the page be read for urls?
Cheers, Frank
On 1/20/06, Neal Whitley <[EMAIL PROTECTED]> wrote:
Franz,
I 'think' you could use the regex url filter to not index this page
(regex-urlfilter.txt).
Something like: -^http://([a-z0-9]*\.)*tripod.com/
I am new to Nutch so I make no guarantee... :-)
Neal
At 05:23 AM 1/20/2006, you wrote:
Hello,
We are trying to implement Nutch on an intranet and have setup a
special page which has links to all the other pages of the
site, since
many are not linked together.
We will start with this special page and then go from there to
all the
other pages, but we would like to not index the first page (so
that it
doesn't show up in search results), just use it for its links.
Is it possible easily?
Thank you.
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net