Dear Lewis, thank you for your fast reply. But just thiat's my problem! I don't compred wich is the field that crates this raw.
But I see a date (eg: "Mercoledì Apr 04") followed by the word "parent" anche after ">" and the the ame of categories (Home NEWSLOT/VLT SCOMMESSE ONLINE LOTTERIE Politica Video Live Score"). Do you know wich field of default nutch configuration generate the 'parent' raw. as you can see in the attachement, this raw is into the content field, between 'str' tags. .. suggestions? tx a. Il giorno 05 aprile 2012 22:45, Lewis John Mcgibbney < [email protected]> ha scritto: > Hi Alessio, > > You need to determine in which field the unwanted content exists. Once > you've done this you could write an indexing filter to remove this from > your document prior to indexing. > > Lewis > > On Thu, Apr 5, 2012 at 9:41 PM, alessio crisantemi < > [email protected]> wrote: > > > > > > > ---------- Messaggio inoltrato ---------- > > Da: alessio crisantemi <[email protected]> > > Date: 05 aprile 2012 22:32 > > Oggetto: request about snippets > > A: [email protected] > > > > > > Dear all, > > I configured my Nutch (1.4) for works with Solr (1.4.1) and I crawl and > > index with success my website. > > > > I have only a problem with the results of my researches. > > Into all results, the snippets have a raw with a string where I can read > > all the categories of my website. I attached a screen shot for explain: > > here, the no good raw is "Mercoledì Apr 04 parent"> Home NEWSLOT/VLT > > SCOMMESSE ONLINE LOTTERIE Politica Video Live Score ") > > > > This is a problem, because if solr read for any page the same raw, when > my > > query is the same word of this raw (eg: 'ONLINe') I have all my solr > index > > like a result. > > > > When I can jump this raw during my crawling? Is possible exclude this > raw? > > thank you in adavande > > alessio > > > > > > > -- > *Lewis* >

