Re: [Nutch-general] managing content size in segments folder

TDLN Sat, 17 Jun 2006 00:45:56 -0700

Yes, correct, summary is retrieved from parse_text and not from the
content so it is not affected by this property. Should have checked
that, but I am happy that is works so far.


Regarding the URL: I am not 100% sure it will serve your needs, but
you can investigate usage of the
org.apache.nutch.net.RegexUrlNormalizer.

Rgrds, Thomas

On 6/17/06, Roberto Monge <[EMAIL PROTECTED]> wrote:
> Thanks, it didn't see the fetcher.store.content attribute in 0.7 so i
> updated to .8-dev.  It worked as advertised, it seems like summary and
> search context still work.  The only thing affected was the cached view of
> the file.  I didn't limit  http.content.limit because i do want all of the
> log files indexed.
>
> My log files are on one servers filesystem, I want to index them via a local
> search fie:///logs but then present the url link as coming from an http root
> so that other users can fetch the files.  Currently if fetch them from the
> local webserver but that's a little inneffient since i know where the files
> are locally on the FS.  Has anyone done a local search but used http urls
> for the search results?
>
> I could modify search.jsp to replace my file:// root with an http root, but
> that seems a little hacky.  Does anyone know if there is a regex-url filter
> for post processing of the link urls?  I tried using the regex-url filter
> but it modified the url before the fetcher used it.  I want to modify via
> regex when entered into the url index or when displayed.
>
> Thanks,
>
> -roberto
>
>
> On 6/15/06, TDLN <[EMAIL PROTECTED]> wrote:
> >
> > I mean disable the cache link in the search.jsp.
> >
> > On 6/15/06, TDLN <[EMAIL PROTECTED]> wrote:
> > > As far as I know, content in the segments is used to generate the
> > > summary in the search results and off course for the cache feature.
> > >
> > > If you don't need these you can adjust the fetcher.store.content and
> > > http.content.limit config properties. Also you might have to change
> > > search.jsp.
> > >
> > > Rgrds, Thomas
> > >
> > > On 6/15/06, Roberto Monge <[EMAIL PROTECTED]> wrote:
> > > > I've been using nutch to index production log files from a client
> > > > application.  It's been a great tool because we do get a large volume
> > of
> > > > logs from the field and often have to go through complicated pattern
> > > > searches.  Lately we're have some issues managing the our disk
> > space.  I
> > > > noticed that nutch keeps all of the content in the segments content
> > folder.
> > > > Is there a reason all of the content is stored?  I didn't see any
> > obvious
> > > > setting for just indexing and not keeping the content.
> > > >
> > > > I do use the more search plugings to do filtering by date and
> > url.  Maybe
> > > > these require the content in the content folders?  Any help would be
> > muchly
> > > > appreciated.
> > > >
> > > > Roberto
> > > >
> > > >
> > >
> >
>
>


_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] managing content size in segments folder

Reply via email to