Just recrawl and reindex every day. That was the simple answer.
The more complex answer is you need to do write custom code that
deletes documents from your index and crawld.
If you not want to complete learn the internals of nutch, just
recrawl and reindex. :)
Stefan
Am 06.06.2006 um 19:42 schrieb Benjamin Higgins:
Hello,
I'm trying to get Nutch suitable to use for our (extensive)
intranet. One
problem I'm trying to solve is how best to tell Nutch to either
reindex or
remove a URL from the index. I have a lot of pages that get
changed, added
and removed daily, and I'd prefer to have the changes reflected in
Nutch's
index immediately.
I am able to generate a list of URLs that have changed or have been
removed,
so I definately do not need to reindex everything, I just need a
way to pass
this list on to Nutch.
How can I do this?
Ben