[Solr Wiki] Update of "SchemaDesign" by Lance Norskog

Apache Wiki Mon, 02 Feb 2009 01:43:17 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by Lance Norskog:
http://wiki.apache.org/solr/SchemaDesign

New page:
General tips & tricks in designing schemas.

Mapping databases to Solr.
Solr provides one table. Storing a set database tables in an index generally 
requires denormalizing some of the tables.

Sorting 
There are two ways of sorting available in Solr 1.4.

Lucene sort and field types:

The Solr sort parameter uses the Lucene sorting tool. This creates an array 
containing an entry for every document in the index. Sorting is then done 
against this array. This array is cached across requests and so repeated sorts 
are fast.  If the field type is 'integer' the array contains only that int and 
thus is 4 bytes * the number of documents. If the field type is anything else, 
this integer array is created and then a separate array is also created with 
much more data (??) per entry. Sorting is also slower if the type is not an 
'integer'.

However, range checks do not work on an 'integer' field. If you want range 
checks and fast sorting, you can create a pair of fields, one of each type, 
with a copyField directive:
{{{
 <field name="popularity" type="sint" indexed="true" stored="true" 
multiValued="false"/>
 <field name="popularitySort" type="integer" indexed="true" stored="false" />
 ...
 <copyField source="popularity" dest="popularitySort"/>
}}}

Note that since multiValued=false is the default for these types, attempting to 
store a value to 'popularitySort' will cause an indexing error, since it also 
always receives a value from 'popularity'. Also there is no reason to store 
both fields, and so 'popularitySort' is index-only.

Text search:

Phrase search:
If you store "To Be Or Not To Be" in a "text" field, none of these words will 
find this document, nor will the phrase in quotes. The problem is that the 
"text" field does not store the input data, but an altered version. If you want 
to have any phrase search work as well as individual words, you need to have 
two fields. Both should be processed similarly, but the phrase search field 
should not use "stemming" or "stopwords".

Phonemes: 
Programmers are perfect spellers and expect the same of their users. A phoneme 
represents (roughly) the sound of one syllable. Phoneme-based searching can 
give users a better search experience. The Metaphone & other phoneme filters 
cause the index to store phoneme-base representations of the text instead of 
the input. So, phoneme filters need to be in both the index and query stacks. 
Of the several available the DoubleMetaphone filter seems to be the most 
popular and does well with non-English text. 
([http://en.wikipedia.org/wiki/Soundex Soundex] was invented 90 years ago!)

[Solr Wiki] Update of "SchemaDesign" by Lance Norskog

Reply via email to