I can't see any of your attachments as they're not permitted on list. Can you provide an URL?
On Thu, Apr 5, 2012 at 9:56 PM, alessio crisantemi < [email protected]> wrote: > Dear Lewis, thank you for your fast reply. > But just thiat's my problem! I don't compred wich is the field that crates > this raw. > > But I see a date (eg: "Mercoledì Apr 04") followed by the word "parent" > anche after ">" and the the ame of categories (Home NEWSLOT/VLT SCOMMESSE > ONLINE LOTTERIE Politica Video Live Score"). > > Do you know wich field of default nutch configuration generate the 'parent' > raw. > > as you can see in the attachement, this raw is into the content field, > between 'str' tags. > .. > suggestions? > tx > a. > > Il giorno 05 aprile 2012 22:45, Lewis John Mcgibbney < > [email protected]> ha scritto: > > > Hi Alessio, > > > > You need to determine in which field the unwanted content exists. Once > > you've done this you could write an indexing filter to remove this from > > your document prior to indexing. > > > > Lewis > > > > On Thu, Apr 5, 2012 at 9:41 PM, alessio crisantemi < > > [email protected]> wrote: > > > > > > > > > > > ---------- Messaggio inoltrato ---------- > > > Da: alessio crisantemi <[email protected]> > > > Date: 05 aprile 2012 22:32 > > > Oggetto: request about snippets > > > A: [email protected] > > > > > > > > > Dear all, > > > I configured my Nutch (1.4) for works with Solr (1.4.1) and I crawl and > > > index with success my website. > > > > > > I have only a problem with the results of my researches. > > > Into all results, the snippets have a raw with a string where I can > read > > > all the categories of my website. I attached a screen shot for explain: > > > here, the no good raw is "Mercoledì Apr 04 parent"> Home NEWSLOT/VLT > > > SCOMMESSE ONLINE LOTTERIE Politica Video Live Score ") > > > > > > This is a problem, because if solr read for any page the same raw, when > > my > > > query is the same word of this raw (eg: 'ONLINe') I have all my solr > > index > > > like a result. > > > > > > When I can jump this raw during my crawling? Is possible exclude this > > raw? > > > thank you in adavande > > > alessio > > > > > > > > > > > > -- > > *Lewis* > > > -- *Lewis*

