Re: Deleting documents

Honda-Search Administrator Fri, 23 Jun 2006 17:56:41 -0700

I appears that I also somehow picked up a bunch of extra documents in myoriginal crawl or subsequent recrawl.


Can anyone give me an example of the prune command in two way:


1.  delete all entries that contain a certain term.
2.  delete all entries from a certain URL

Thanks for any help anyone can offer.

Matt

----- Original Message -----From: "TDLN" <[EMAIL PROTECTED]>

To: <[email protected]>; "Dima Mazmanov" <[EMAIL PROTECTED]>
Sent: Friday, June 23, 2006 10:52 AM
Subject: Re: Deleting documents

Prune is ok to remove the docs from the index, but it will not prevent
the pages from being refetched, so you might also want to change the
regex-urlfilter (or crawl-ulrfilter if you are usign the crawltool)
for that purpose.

Rgrds,. Thomas

On 6/22/06, Dima Mazmanov <[EMAIL PROTECTED]> wrote:

Hi,Rajesh.

Use "prune" tool.
./nutch prune /path/to/segments/dir /path/to/file/with/rules

You wrote 21 èþíÿ 2006 ã., 20:35:34:

> I would like to delete certain documents from the crawled documents

> depending on a certain criteria. Is there a way to achieve this? My> guess

> is, nutch downloads all the files before parsing it.


> __________ NOD32 1.1611 (20060620) Information __________

> This message was checked by NOD32 antivirus system.
> http://www.eset.com




--
Regards,
 Dima                          mailto:[EMAIL PROTECTED]

Re: Deleting documents

Reply via email to