Hi Grant,

On 12/02/2009 at 2:30 PM, Grant Ingersoll wrote:
> I've been noodling around with the idea with the notion of a
> "layered" field where variants of a primary token are stored at
> "sub positions" of the primary token (instead of in separate copy
> fields)

The Indri search engine (now part of Lemur) uses a similar idea: fields are 
implemented as potentially overlapping extents over the (single) stream of 
document tokens.  (Howard Turtle, who is now the CNLP director, and has been 
involved in Indri development, told me about this feature.  He says it allows 
for natural representation of fields projected onto hierarchical data, e.g. 
XML.)  I wasn't able to find much documentation about this online when I looked 
just now, but here's a high-level overview of the Indri "repository" (aka 
index) structure:

http://www.lemurproject.org/docs/index.php/Indri_Repository_Structure

(See the "Field Information Files" section near the bottom.)

Steve

Reply via email to