That is correct. Solr is a search engine, not a text analysis engine. There are a few open source text analysis systems: Weka, OpenNLP, UIMA.
Someone is working on integrating UIMA with Solr: https://issues.apache.org/jira/browse/SOLR-2129 But you should generally assume you will have a batch processing pass over the data before indexing it. On Mon, Dec 6, 2010 at 12:04 PM, webdev1977 <webdev1...@gmail.com> wrote: > > Thanks for the quick response! > > I was thinking more about the idea of having both structured and unstructred > data coming into a system to be indexed/searched. I would like these > documents to be processed by some sort of entity/keyword/semantic > processing. I have a well defined taxonomy for my organization (it is quite > large) and at the moment we use RetrievalWare to give keyword/classification > suggestions. This does NOT work well though, and RetrievalWare is pretty > much useless to us. > > I want a way to do this process either at index time or search time. All > documents should be processed against this taxonomy. I do not want the user > to be able to nominate keywords, it must happen automatically. I am > assuming it is only natural for these keywords/taxonomy entities to show up > as hierarchical facets? > > From what I can tell, there is no way to tell Solr.. here is my taxonomy.. > classify my documents and give me back facets and facet counts.. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2029636.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com