Hi Markus, Thanks for the hints.
Regards, On Tue, Feb 18, 2014 at 3:25 PM, Markus Jelsma <[email protected]>wrote: > Right now the only way to do so are manual deleteByQuery operations using > Lucene's cabability of regex queries. Keep in mind that Nutch' filter does > a find() where Lucene needs a match() so you have to rewrite the queries. > > -----Original message----- > > From:Bayu Widyasanyata <[email protected]> > > Sent: Tuesday 18th February 2014 0:02 > > To: [email protected] > > Subject: How to check URL that have been indexed by Solr? > > > > Hi, > > > > Sometimes we accidentally crawls unneeded URLs format until push them > into > > last "solrindex" step. > > > > As we know we can drop or delete those URLs by add regex on > > regex-urlfilter.txt and do "nutch updatedb". Then those URL will be > > dropped/deleted from crawldb database. > > > > But, how to ensure URLs that have been indexed by Solr ("nutch > solrindex") > > before we do "nutch updatedb" has also deleted? > > Does the URL is also deleted when we perform "solrindex" again? > > > > Thank you.- > > > > -- > > wassalam, > > [bayu] > > > -- wassalam, [bayu]

