Hi, Sometimes we accidentally crawls unneeded URLs format until push them into last "solrindex" step.
As we know we can drop or delete those URLs by add regex on
regex-urlfilter.txt and do "nutch updatedb". Then those URL will be
dropped/deleted from crawldb database.
But, how to ensure URLs that have been indexed by Solr ("nutch solrindex")
before we do "nutch updatedb" has also deleted?
Does the URL is also deleted when we perform "solrindex" again?
Thank you.-
--
wassalam,
[bayu]

