excellent Markus We are slowly but surely getting nearer... :0)
On Tue, Aug 30, 2011 at 1:14 AM, Markus Jelsma <[email protected]>wrote: > > > Hi, > > > > As the title suggests, I'm in the process of getting some comprehensive > > documentation sorted out for Nutch, this obviously starts at wiki level. > > I'm currently working on the IndexStructure page [1]. I would appreciate > > if some guys could have a quick look and correct where they see fit. > > > > In addition I have a couple of quick questions regarding the last 4 > fields > > I'm trying to account for > > > > 1) BOOST - As far as I am aware this was deprecated in Nutch 1.2 or Nutch > > 1.1... correct/wrong? > > This would be value of the scoring filter, OPIC or LinkRank or some custom > made scoring. > > > 2) DIGEST - Don't have a clue > > The digest of the document. Can be MD5 over content and headers or more > sophisticated text profile of the content. > > > 3) SEGMENT - as 2 > > The originating segment of the document, used to identify the most recent > segment in which you can find this document. In older Nutch version this > was > also used (IIRC) to load a `cached` version of the document. > > > 4) TIMESTAMP - as 2 > > Most recent fetch time. > > > > > Would be great if people could fill me in with the grey areas please. > > > > Finally, what a job all contributors, dev's and committers made cleaning > up > > plugin directory even between Nutch 1.2 and 1.3 release. It's not until > you > > see previous versions on SVN that you can fully appreciate the excellent > > job that has been made with 1.3 release. :0) > > > > [1] http://wiki.apache.org/nutch/IndexStructure > -- *Lewis*

