It depends if you control the seed pages or not; if you do, you could tag them index="no" and skip them during indexing. You would have to change HtmlParser and BasicIndexingFilter.
Rgrds, Thomas On 4/4/06, Benjamin Higgins <[EMAIL PROTECTED]> wrote: > > Hello, > > I've gone through the documentation and tried searching the mailing list > archives. I bet this has come up before, but I just couldn't find > it. So, > if someone could point me to a past discussion that would be great. > > What I want to do is be able to crawl html files for links, but not > actually > index that file. I ask this because I have several seed pages that are > not > meant for human consumption, so I never want them to show up in search > results. > > How can this be accomplished? > > Thanks in advance, > > Ben > >
