Re: request about snippets (with attachement)

alessio crisantemi Thu, 05 Apr 2012 14:20:08 -0700

what is it 'breadcrumb' Markus?

Il giorno 05 aprile 2012 23:08, Markus Jelsma
<[email protected]>ha scritto:


> Seems to me it's just the breadcrumb of the page popping up in Solr's
> highlighter snippet?
>
>
>
> In Thu, 5 Apr 2012 22:02:31 +0100, Lewis John Mcgibbney <
> [email protected]> wrote:
>
>> I can't see any of your attachments as they're not permitted on list.
>>
>> Can you provide an URL?
>>
>> On Thu, Apr 5, 2012 at 9:56 PM, alessio crisantemi <
>> [email protected]> wrote:
>>
>>  Dear Lewis, thank you for your fast reply.
>>> But just thiat's my problem! I don't compred wich is the field that
>>> crates
>>> this raw.
>>>
>>> But I see a date (eg: "Mercoledì Apr 04") followed by the word "parent"
>>> anche after ">" and the the ame of categories (Home NEWSLOT/VLT SCOMMESSE
>>> ONLINE LOTTERIE Politica Video Live Score").
>>>
>>> Do you know wich field of default nutch configuration generate the
>>> 'parent'
>>> raw.
>>>
>>> as you can see in the attachement, this raw is into the content field,
>>> between 'str' tags.
>>> ..
>>> suggestions?
>>> tx
>>> a.
>>>
>>> Il giorno 05 aprile 2012 22:45, Lewis John Mcgibbney <
>>> [email protected]> ha scritto:
>>>
>>> > Hi Alessio,
>>> >
>>> > You need to determine in which field the unwanted content exists. Once
>>> > you've done this you could write an indexing filter to remove this from
>>> > your document prior to indexing.
>>> >
>>> > Lewis
>>> >
>>> > On Thu, Apr 5, 2012 at 9:41 PM, alessio crisantemi <
>>> > [email protected]> wrote:
>>> >
>>> > >
>>> > >
>>> > > ---------- Messaggio inoltrato ----------
>>> > > Da: alessio crisantemi <[email protected]>
>>> > > Date: 05 aprile 2012 22:32
>>> > > Oggetto: request about snippets
>>> > > A: [email protected]
>>> > >
>>> > >
>>> > > Dear all,
>>> > > I configured my Nutch (1.4) for works with Solr (1.4.1) and I crawl
>>> and
>>> > > index with success my website.
>>> > >
>>> > > I have only a problem with the results of my researches.
>>> > > Into all results, the snippets have a raw with a string where I can
>>> read
>>> > > all the categories of my website. I attached a screen shot for
>>> explain:
>>> > > here, the no good raw is "Mercoledì Apr 04 parent"> Home NEWSLOT/VLT
>>> > > SCOMMESSE ONLINE LOTTERIE Politica Video Live Score ")
>>> > >
>>> > > This is a problem, because if solr read for any page the same raw,
>>> when
>>> > my
>>> > > query is the same word of this raw (eg: 'ONLINe') I have all my solr
>>> > index
>>> > > like a result.
>>> > >
>>> > > When I can jump this raw during my crawling? Is possible exclude this
>>> > raw?
>>> > > thank you in adavande
>>> > > alessio
>>> > >
>>> > >
>>> >
>>> >
>>> > --
>>> > *Lewis*
>>> >
>>>
>>>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/**markus17<http://www.linkedin.com/in/markus17>
> 050-8536600 / 06-50258350
>

Re: request about snippets (with attachement)

Reply via email to