Hi Markus,

Thanks for the hints.

Regards,


On Tue, Feb 18, 2014 at 3:25 PM, Markus Jelsma
<[email protected]>wrote:

> Right now the only way to do so are manual deleteByQuery operations using
> Lucene's cabability of regex queries. Keep in mind that Nutch' filter does
> a find() where Lucene needs a match() so you have to rewrite the queries.
>
> -----Original message-----
> > From:Bayu Widyasanyata <[email protected]>
> > Sent: Tuesday 18th February 2014 0:02
> > To: [email protected]
> > Subject: How to check URL that have been indexed by Solr?
> >
> > Hi,
> >
> > Sometimes we accidentally crawls unneeded URLs format until push them
> into
> > last "solrindex" step.
> >
> > As we know we can drop or delete those URLs by add regex on
> > regex-urlfilter.txt and do "nutch updatedb". Then those URL will be
> > dropped/deleted from crawldb database.
> >
> > But, how to ensure URLs that have been indexed by Solr ("nutch
> solrindex")
> > before we do "nutch updatedb" has also deleted?
> > Does the URL is also deleted when we perform "solrindex" again?
> >
> > Thank you.-
> >
> > --
> > wassalam,
> > [bayu]
> >
>



-- 
wassalam,
[bayu]

Reply via email to