I think I found an answer to my own question: A Field object contains a name (a String) and a value (a String or a Reader), and three booleans that control whether or not the value will be indexed for searches, tokenized prior to indexing, and stored in the index so it can be returned with the search.
Let me explain those three booleans a bit more. - *Indexed for searches* - sometimes you'll want to have fields available in your Documents that don't really have anything to do with searching. Two examples I can think of off the top of my head are creation dates and file names, so you can compare when the Document was created against the file modification date, and decide if the document needs to be reindexed. Since these fields won't ever make sense to use in an actual search, you can decrease the amount of work Lucene does by marking them as not indexed for searches. - *Tokenized prior to indexing* - tokenizing refers to taking a piece of text and cleaning it up, and breaking it down into individual pieces (tokens) for the indexer. This is done by the Analyzer. Some fields you may not want to be tokenized, for example a serial number field. - *Stored in the index* - even if a field is entirely indexed, it doesn't necessarily mean that it'll be easy for Lucene to reconstruct it. Although Lucene is a search index, and not a database, if your fields are reasonably small, you can ask Lucene to store them in the index. With the fields stored in the index, instead of using the Document to locate the original file or data and load it, you can actually pull the data out of the Document. This works best with fairly small fields and documents that you'd need to parse for display anyway. Some fields contain bulk data and are so large that you don't really want to store them in the index. You can still make your life a little easier by storing not just the filename, but a Reader object in the Field. This makes it simpler for your application to just get the Reader out of the Hit and use it to read in the data to display it to the user. >From http://darksleep.com/lucene/ Sorry for the noise. -John On Wed, Jun 25, 2008 at 2:58 PM, John Thompson <[EMAIL PROTECTED]> wrote: > Hi, > > I'm trying to understand the members of the Field class. According to > http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/document/Field.html > : > > Field.Index.NO implies: > > Do not index the field value. This field can thus not be searched, but one > can still access its contents provided it is > stored<http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/document/Field.Store.html>. > > > But Field.Store.YES implies: > > Store the original field value in the index. This is useful for short texts > like a document's title which should be displayed with the results. The > value is stored in its original form, i.e. no analyzer is used before it is > stored. > > I'm not sure I understand the relationship between indexing and storing. > According to the above: I can still access a field's content if I have not > indexed its value, as long as I have stored that field. But storing a field > is by definition "storing the original field value in the index." > > *scratches head* > > What is the difference between "indexing a field value" and "storing an > original field value in the index"? > > -John >
