Hello Hany,

Sure, check these commands:

 solrclean         remove HTTP 301 and 404 documents from solr - DEPRECATED
use the clean command instead
 clean             remove HTTP 301 and 404 documents and duplicates from
indexing backends configured via plugins

Regards,
Markus

Op di 9 mrt. 2021 om 08:49 schreef Hany NASR <[email protected]>:

> Hello Markus,
>
> I added the property in nutch-site.xml with no luck.
>
> The documents still exist in Solr; any advice?
>
> Regards,
> Hany
>
> From: Markus Jelsma <[email protected]>
> Sent: Monday, March 8, 2021 3:40 PM
> To: [email protected]
> Subject: EXTERNAL: Re: 301 perm redirect pages are still in Solr
>
> Hello Hany,
>
> You need to tell the indexer to delete those record. This will help:
>
>   <!-- delete gone and redirects -->
>  <property>
>    <name>indexer.delete</name>
>    <value>true</value>
>  </property>
>
> Regards,
> Markus
>
> Op ma 8 mrt. 2021 om 15:31 schreef Hany NASR <[email protected]<mailto:
> [email protected]>.invalid>:
>
> > Hi All,
> >
> > I'm using Nutch 1.15, and figure out that permeant redirect pages (301)
> > are still indexed and not removed in Solr.
> >
> > When I exported the crawlDB I found the page Status: 5 (db_redir_perm).
> >
> > How can I keep Solr index up to date and make Nutch clean these pages
> > automatically?
> >
> > Regards,
> > Hany
> >
> > -----------------------------------------
> > SAVE PAPER - THINK BEFORE YOU PRINT!
> >
> > This E-mail is confidential.
> >
> > It may also be legally privileged. If you are not the addressee you may
> > not copy,
> > forward, disclose or use any part of it. If you have received this
> message
> > in error,
> > please delete it and all copies from your system and notify the sender
> > immediately by
> > return E-mail.
> >
> > Internet communications cannot be guaranteed to be timely secure, error
> or
> > virus-free.
> > The sender does not accept liability for any errors or omissions.
> >
>
> ******************************************************************
> This message originated from the Internet.  Its originator may or
> may not be who they claim to be and the information contained in
> the message and any attachments may or may not be accurate.
> ******************************************************************
>
> -----------------------------------------
> SAVE PAPER - THINK BEFORE YOU PRINT!
>
> This E-mail is confidential.
>
> It may also be legally privileged. If you are not the addressee you may
> not copy,
> forward, disclose or use any part of it. If you have received this message
> in error,
> please delete it and all copies from your system and notify the sender
> immediately by
> return E-mail.
>
> Internet communications cannot be guaranteed to be timely secure, error or
> virus-free.
> The sender does not accept liability for any errors or omissions.
>

Reply via email to