excellent Markus

We are slowly but surely getting nearer... :0)

On Tue, Aug 30, 2011 at 1:14 AM, Markus Jelsma
<[email protected]>wrote:

>
> > Hi,
> >
> > As the title suggests, I'm in the process of getting some comprehensive
> > documentation sorted out for Nutch, this obviously starts at wiki level.
> > I'm currently working on the IndexStructure page [1]. I would appreciate
> > if some guys could have a quick look and correct where they see fit.
> >
> > In addition I have a couple of quick questions regarding the last 4
> fields
> > I'm trying to account for
> >
> > 1) BOOST - As far as I am aware this was deprecated in Nutch 1.2 or Nutch
> > 1.1... correct/wrong?
>
> This would be value of the scoring filter, OPIC or LinkRank or some custom
> made scoring.
>
> > 2) DIGEST - Don't have a clue
>
> The digest of the document. Can be MD5 over content and headers or more
> sophisticated text profile of the content.
>
> > 3) SEGMENT - as 2
>
> The originating segment of the document, used to identify the most recent
> segment in which you can find this document. In older Nutch version this
> was
> also used (IIRC) to load a `cached` version of the document.
>
> > 4) TIMESTAMP - as 2
>
> Most recent fetch time.
>
> >
> > Would be great if people could fill me in with the grey areas please.
> >
> > Finally, what a job all contributors, dev's and committers made cleaning
> up
> > plugin directory even between Nutch 1.2 and 1.3 release. It's not until
> you
> > see previous versions on SVN that you can fully appreciate the excellent
> > job that has been made with 1.3 release.  :0)
> >
> > [1] http://wiki.apache.org/nutch/IndexStructure
>



-- 
*Lewis*

Reply via email to