Hi David,

Thanks... Is there a way in nutch to reindex the files based on the last
modified date???
I have large numbers of pdf's and doc's in a folder. Do i need to reindex
all the files every time i want to update my index?

On 2/8/06, David Wallace <[EMAIL PROTECTED]> wrote:
>
> Hi Saravanaraj,
> For each URL, Nutch reads your filter file from top to bottom, until it
> finds a line (+ or -) that matches the URL.  Then it stops reading.
> Therefore, any files inside E:/Index Samples/Index/ will be INCLUDED,
> because they match the line that says +^file:/E:/Index Samples/.
>
> I suggest you swap over the two lines in the filter file: put
> -^file:/E:/Index Samples/Index/ BEFORE +^file:/E:/Index Samples/; so
> that Nutch encounters it first, when deciding whether to include files
> in that directory.
>
> Regards,
> David.
>
>
> On Mon, 2006-02-06 at 09:03 +0530, Saravanaraj Duraisamy wrote:
> > Hi i am using nutch to index files in local FS and FTP.
> >
> > my filter file is
> >
> > -^(http|ftp|mailto):
> >
>
> -\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|mpg|gz|rpm|tgz|mov|MOV|exe|png|PNG|jar)$
> > [EMAIL PROTECTED]
> > -.*(/.+?)/.*?\1/.*?\1/
> > +^file:/E:/Index Samples/
> > -^file:/E:/Index Samples/Index/
> >
> > but nutch crawls the forbidden folders also. is there a web db kind
> of thing
> > for files also. is it possible to make nutch to index files based on
> the
> > last modified date.
> >
> > can anybody suggest the datastructure for webdb (filedb??) for files.
> it
> > will be good to group files and create seperate segements for each
> group. so
> > if some files are changed, only those segments can be replaced.
> >
> > Rgds,
> > D.Saravanaraj
>
>
>
>
> ********************************************************************************
> This email may contain legally privileged information and is intended only
> for the addressee. It is not necessarily the official view or
> communication of the New Zealand Qualifications Authority. If you are not
> the intended recipient you must not use, disclose, copy or distribute this
> email or
> information in it. If you have received this email in error, please
> contact the sender immediately. NZQA does not accept any liability for
> changes made to this email or attachments after sending by NZQA.
>
> All emails have been scanned for viruses and content by MailMarshal.
> NZQA reserves the right to monitor all email communications through its
> network.
>
>
> ********************************************************************************
>
>

Reply via email to