[Solr Wiki] Update of "SchemaDesign" by Lance Norskog

Apache Wiki Sun, 08 Feb 2009 23:54:44 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by Lance Norskog:
http://wiki.apache.org/solr/SchemaDesign

The comment on the change is:
Rewrite item about stemming/stopwords from "phrase search" to "raw text search"

------------------------------------------------------------------------------
  = Mapping databases to Solr =
  Solr provides one table. Storing a set database tables in an index generally 
requires denormalizing some of the tables. Attempts to avoid denormalizing 
usually fail.
  = Field contents =
- The more heterogeneous (different kinds of data) you have in one field or in 
one index, the less useful it is. For example, if you have text in different 
languages, it is more useful to store them in different fields: text_en, 
text_fr, etc. than all in one field. When you search against that one field 
English and French words and phrases will be searched with equal interest.
+ The more heterogeneous (different kinds of data) you have in one field or in 
one index, the less useful it is. For example, if you have text in different 
languages, it is more useful to store them in different fields: text_en, 
text_fr, etc. than all in one field. This way you can search for only English, 
only French etc.
  = Sorting =
  There are two ways of sorting available in Solr 1.4: Lucene's sorting feature 
and function queries.
  == Lucene Sorting ==
@@ -21, +21 @@

  There may be performance differences with this technique v.s. the Lucene 
sorting algorithm.
  = Multiple Text Search Field types =
  The "text" field type in the example schema.xml provides basic text search 
for English text. But, it has a surprise: the actual text given to this field 
is not indexed as-is, and therefore searching for the raw text may not work. If 
you store "To Be Or Not To Be" in a "text" field, none of these words will 
found this document, nor will the phrase in quotes. The above words are all 
''stopwords'' and are stripped from the input text. Another transform is 
''stemming'', which stores both 'change' and 'changing' as the word 'chang'.  
Stemming is done at both index and query time, so a query of 'changing' will 
match a document containing 'change'.
- == Phrase search ==
+ == Raw text search ==
- If you want to have any phrase search work as well as individual words, you 
need to have two fields. Both should be processed similarly, but the phrase 
search field should not use stemming or stopwords. 
+ If you search this text field for "changing" you will also find "changes". It 
is also useful to be able to search for individual words, especially when 
searching phrases. This requires a separate text field without the stemming and 
stopword filters. This field will store "changing" in one document and 
"changes" in another document. 
  == Phonemes ==
  Programmers are perfect spellers and expect the same of their users. A 
''phoneme'' represents (roughly) the sound of one syllable. Phoneme-based 
searching can give users a better search experience. To support misspelled 
search words phoneme filters cause the index to store phoneme-base 
representations of the text instead of the input. This only finds misspellings 
which sound like the original word.

[Solr Wiki] Update of "SchemaDesign" by Lance Norskog

Reply via email to