what is it 'breadcrumb' Markus? Il giorno 05 aprile 2012 23:08, Markus Jelsma <markus.jel...@openindex.io>ha scritto:
> Seems to me it's just the breadcrumb of the page popping up in Solr's > highlighter snippet? > > > > In Thu, 5 Apr 2012 22:02:31 +0100, Lewis John Mcgibbney < > lewis.mcgibb...@gmail.com> wrote: > >> I can't see any of your attachments as they're not permitted on list. >> >> Can you provide an URL? >> >> On Thu, Apr 5, 2012 at 9:56 PM, alessio crisantemi < >> alessio.crisant...@gmail.com> wrote: >> >> Dear Lewis, thank you for your fast reply. >>> But just thiat's my problem! I don't compred wich is the field that >>> crates >>> this raw. >>> >>> But I see a date (eg: "Mercoledì Apr 04") followed by the word "parent" >>> anche after ">" and the the ame of categories (Home NEWSLOT/VLT SCOMMESSE >>> ONLINE LOTTERIE Politica Video Live Score"). >>> >>> Do you know wich field of default nutch configuration generate the >>> 'parent' >>> raw. >>> >>> as you can see in the attachement, this raw is into the content field, >>> between 'str' tags. >>> .. >>> suggestions? >>> tx >>> a. >>> >>> Il giorno 05 aprile 2012 22:45, Lewis John Mcgibbney < >>> lewis.mcgibb...@gmail.com> ha scritto: >>> >>> > Hi Alessio, >>> > >>> > You need to determine in which field the unwanted content exists. Once >>> > you've done this you could write an indexing filter to remove this from >>> > your document prior to indexing. >>> > >>> > Lewis >>> > >>> > On Thu, Apr 5, 2012 at 9:41 PM, alessio crisantemi < >>> > alessio.crisant...@gmail.com> wrote: >>> > >>> > > >>> > > >>> > > ---------- Messaggio inoltrato ---------- >>> > > Da: alessio crisantemi <alessio.crisant...@gmail.com> >>> > > Date: 05 aprile 2012 22:32 >>> > > Oggetto: request about snippets >>> > > A: user@nutch.apache.org >>> > > >>> > > >>> > > Dear all, >>> > > I configured my Nutch (1.4) for works with Solr (1.4.1) and I crawl >>> and >>> > > index with success my website. >>> > > >>> > > I have only a problem with the results of my researches. >>> > > Into all results, the snippets have a raw with a string where I can >>> read >>> > > all the categories of my website. I attached a screen shot for >>> explain: >>> > > here, the no good raw is "Mercoledì Apr 04 parent"> Home NEWSLOT/VLT >>> > > SCOMMESSE ONLINE LOTTERIE Politica Video Live Score ") >>> > > >>> > > This is a problem, because if solr read for any page the same raw, >>> when >>> > my >>> > > query is the same word of this raw (eg: 'ONLINe') I have all my solr >>> > index >>> > > like a result. >>> > > >>> > > When I can jump this raw during my crawling? Is possible exclude this >>> > raw? >>> > > thank you in adavande >>> > > alessio >>> > > >>> > > >>> > >>> > >>> > -- >>> > *Lewis* >>> > >>> >>> > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/**markus17<http://www.linkedin.com/in/markus17> > 050-8536600 / 06-50258350 >