I am using Solr to power faceting features for our  application.

I know that SOLR can do free text search but what is the best practice for
faceting on common terms inside SOLR text fields?

For example, we have a large blob of text (a description of a property)
which contains useful text to facet on like 'city', 'formation', 'year',
'school', 'skill', ... dozens more like these.

I would like to create a view which lets users see the number of properties
with each of these terms and allow the users to drill down to the relevant

One obvious solution is to pre-process the data, parse the text, and create
the facets for each of these key phrases with a boolean yes/no value.

I'd ideally like to automate this, so I imagine the SOLR free text search
engine might allow this? e.g. Can I use the free text search engine to
remove stop words and collect counts of common phrases which we can then
present to the user?

If pre-processing is the only way, is there a common/best practice approach
to this or any open source libraries which perform this function?

What is the best practice for counting and grouping common phrases from a
text field in SOLR?


