Well look, sharding is for distributed queries right?
Regardless of what fields you store in a nutch document, the reason you
store them is to have 'some' level of structure in your data.
Boost, id, tstamp etc. are merely fields which enable you to do so...
nothing more.
Nutch plugins (and some of the indexing core) enables you to add or remove
such fields in your index depending on what you want, how you feel, what
the weather is like etc. The id, boost, segment and digest fields are no
different.


On Saturday, February 16, 2013,  <[email protected]> wrote:
> Do you mean  they help when sharding?
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Lewis John Mcgibbney <[email protected]>
> To: user <[email protected]>
> Sent: Sat, Feb 16, 2013 10:58 am
> Subject: Re: fields in solrindex-mapping.xml
>
>
> In short, it helps with searching when you can slice your data using these
> fields
>
> On Saturday, February 16, 2013, Markus Jelsma <[email protected]>
> wrote:
>> Those are added by IndexerMapReduce (or 2.x equivalent) and index-basic.
> They contain the crawl datum's signature, the time stamp (see index-basic)
> and crawl datum score. If you think you don't need them, you can safely
> omit them.
>>
>> -----Original message-----
>>> From:[email protected] <[email protected]>
>>> Sent: Sat 16-Feb-2013 19:21
>>> To: [email protected]
>>> Subject: Re: fields in solrindex-mapping.xml
>>>
>>> Hi Lewis,
>>>
>>> Why do we need to include digest, tstamp, boost and batchid fields in
> solrindex?
>>>
>>> Thanks.
>>> Alex.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Lewis John Mcgibbney <[email protected]>
>>> To: user <[email protected]>
>>> Sent: Fri, Feb 15, 2013 4:21 pm
>>> Subject: Re: fields in solrindex-mapping.xml
>>>
>>>
>>> Hi Alex,
>>> OK so we can certainly remove segment from 2.x solr-index-mapping.xml.
It
>>> would however be nice to replace this with the appropriate batchId.
>>> Can someone advise where the 'segment' field currently comes from in
> trunk?
>>> That way we can at least map the field to the batchId equivalent in 2.x
>>>
>>> Thank you
>>> Lewis
>>>
>>> On Fri, Feb 15, 2013 at 2:23 PM, <[email protected]> wrote:
>>>
>>> > Hi Lewis,
>>> >
>>> > If I exclude one of the fileds tstamp, digest, and boost from
>>> > solindex-mapping and schema.xml, solrindex gives error
>>> >
>>> > SEVERE: org.apache.solr.common.SolrException: ERROR:
> [doc=com.yahoo:http/]
>>> > unknown field 'tstamp'
>>> >
>>> > for each of above fields, except segment.
>>> >
>>> > Alex.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > -----Original Message-----
>>> > From: Lewis John Mcgibbney <[email protected]>
>>> > To: user <[email protected]>
>>> > Sent: Thu, Feb 14, 2013 8:34 pm
>>> > Subject: Re: fields in solrindex-mapping.xml
>>> >
>>> >
>>> > Hi Alex,
>>> > Tstamp represents fetch tiem, used for deduplication.
>>> > Boost is for scoring-opic and link. This is required in 2.x as well.
>>> > I don't have the code right now, but you can try removing digest and
>>> > segment. To me they both look legacy.
>>> > There is a wiki page on index structure which you can consult and/or
> add to
>>> > should you wish.
>>> > Thank you
>>> > Lewis
>>> >
>>> > On Thursday, February 14, 2013,  <[email protected]> wrote:
>>> > > Hello,
>>> > >
>>> > > I see that there are
>>> > >
>>> > >                 <field dest="segment" source="segment"/>
>>> > >                 <field dest="boost" source="boost"/>
>>> > >                 <field dest="digest" source="digest"/>
>>> > >                 <field dest="tstamp" source="tstamp"/*Lewis*
>
>
>

-- 
*Lewis*

Reply via email to