Yes, correct, summary is retrieved from parse_text and not from the content so it is not affected by this property. Should have checked that, but I am happy that is works so far.
Regarding the URL: I am not 100% sure it will serve your needs, but you can investigate usage of the org.apache.nutch.net.RegexUrlNormalizer. Rgrds, Thomas On 6/17/06, Roberto Monge <[EMAIL PROTECTED]> wrote: > Thanks, it didn't see the fetcher.store.content attribute in 0.7 so i > updated to .8-dev. It worked as advertised, it seems like summary and > search context still work. The only thing affected was the cached view of > the file. I didn't limit http.content.limit because i do want all of the > log files indexed. > > My log files are on one servers filesystem, I want to index them via a local > search fie:///logs but then present the url link as coming from an http root > so that other users can fetch the files. Currently if fetch them from the > local webserver but that's a little inneffient since i know where the files > are locally on the FS. Has anyone done a local search but used http urls > for the search results? > > I could modify search.jsp to replace my file:// root with an http root, but > that seems a little hacky. Does anyone know if there is a regex-url filter > for post processing of the link urls? I tried using the regex-url filter > but it modified the url before the fetcher used it. I want to modify via > regex when entered into the url index or when displayed. > > Thanks, > > -roberto > > > On 6/15/06, TDLN <[EMAIL PROTECTED]> wrote: > > > > I mean disable the cache link in the search.jsp. > > > > On 6/15/06, TDLN <[EMAIL PROTECTED]> wrote: > > > As far as I know, content in the segments is used to generate the > > > summary in the search results and off course for the cache feature. > > > > > > If you don't need these you can adjust the fetcher.store.content and > > > http.content.limit config properties. Also you might have to change > > > search.jsp. > > > > > > Rgrds, Thomas > > > > > > On 6/15/06, Roberto Monge <[EMAIL PROTECTED]> wrote: > > > > I've been using nutch to index production log files from a client > > > > application. It's been a great tool because we do get a large volume > > of > > > > logs from the field and often have to go through complicated pattern > > > > searches. Lately we're have some issues managing the our disk > > space. I > > > > noticed that nutch keeps all of the content in the segments content > > folder. > > > > Is there a reason all of the content is stored? I didn't see any > > obvious > > > > setting for just indexing and not keeping the content. > > > > > > > > I do use the more search plugings to do filtering by date and > > url. Maybe > > > > these require the content in the content folders? Any help would be > > muchly > > > > appreciated. > > > > > > > > Roberto > > > > > > > > > > > > > > > _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general