RE: How to check URL that have been indexed by Solr?

Markus Jelsma Tue, 18 Feb 2014 00:26:24 -0800

Right now the only way to do so are manual deleteByQuery operations using 
Lucene's cabability of regex queries. Keep in mind that Nutch' filter does a 
find() where Lucene needs a match() so you have to rewrite the queries. 
 
-----Original message-----
> From:Bayu Widyasanyata <[email protected]>
> Sent: Tuesday 18th February 2014 0:02
> To: [email protected]
> Subject: How to check URL that have been indexed by Solr?
> 
> Hi,
> 
> Sometimes we accidentally crawls unneeded URLs format until push them into
> last "solrindex" step.
> 
> As we know we can drop or delete those URLs by add regex on
> regex-urlfilter.txt and do "nutch updatedb". Then those URL will be
> dropped/deleted from crawldb database.
> 
> But, how to ensure URLs that have been indexed by Solr ("nutch solrindex")
> before we do "nutch updatedb" has also deleted?
> Does the URL is also deleted when we perform "solrindex" again?
> 
> Thank you.-
> 
> -- 
> wassalam,
> [bayu]
>

RE: How to check URL that have been indexed by Solr?

Reply via email to