[Solr Wiki] Update of "OpenNLP" by LanceXNorskog

Apache Wiki Sun, 26 Aug 2012 00:43:43 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The "OpenNLP" page has been changed by LanceXNorskog:
http://wiki.apache.org/solr/OpenNLP?action=diff&rev1=8&rev2=9

  Now, go to trunk-dir/solr and run 'ant test-contrib'. It compiles the OpenNLP 
lucene and solr code against the OpenNLP libraries and uses the small model 
files.
  
  === Deployment to Solr ===
- A Solr core requires schema types for the OpenNLP Tokenizer & Filter, and 
also requires model files.  The distribution includes a schema.xml file in 
solr/contrib/opennlp/src/test-files/opennlp/solr/conf/ which demonstrates 
OpenNLP-based analyzers. It does not contain other text types (to avoid falling 
out of date with the full text suite). You should copy the text types from this 
file into your test collection schema.xml, and download "real" models for 
testing. Also, you may have to add the OpenNLP lib directory to your solr/lib 
or solr/cores/collection/lib directory.
+ A Solr core requires schema types for the OpenNLP Tokenizer & Filter, and 
also requires "real" model files.  The distribution includes a schema.xml file 
in solr/contrib/opennlp/src/test-files/opennlp/solr/conf/ which demonstrates 
OpenNLP-based analyzers. It does not contain other text types (to avoid falling 
out of date with the full text suite). You should copy the text types from this 
file into your test collection schema.xml, and download "real" models for 
testing. Also, you may have to add the OpenNLP lib directory to your solr/lib 
or solr/cores/collection/lib directory. The text types assume that 
cores/collection/conf/opennlp contains the OpenNLP model files. 
  
- Now, download these model files to 
solr/contrib/opennlp/src/test-files/opennlp/solr/conf/opennlp/
+ This server has "real" models for the OpenNLP project. Download model files 
to your solr/cores/collection/conf/opennlp directory.
  
   * http://opennlp.sourceforge.net/models-1.5/
    * The English-language models start with 'en'. The 'maxent' models are 
preferred to the 'perceptron' models.
  
- Your Solr should start without any Exceptions. At this point, go to the 
Schema analyzer, pick the 'text_opennlp_pos' field type, and post a sentence or 
two to the analyzer. You should get text tokenized with payloads. 
Unfortunately, the analysis page shows them as bytes instead of text. If you 
would like this in text form, then go vote on SOLR-3493.
+ Your Solr should start without any Exceptions. At this point, go to the 
Schema analyzer, pick the 'text_opennlp_pos' field type, and post a sentence or 
two to the analyzer. You should get text tokenized with payloads. 
Unfortunately, the analysis page shows them as bytes instead of text. If you 
would like to see them in text form, then go vote on SOLR-3493 (or implement 
it).
  
  == Licensing ==
  The OpenNLP library is Apache. The 'jwnl' library is 'BSD-like'.

[Solr Wiki] Update of "OpenNLP" by LanceXNorskog

Reply via email to