keywordlinkingengine.mdtext

rwesten Thu, 22 Sep 2011 08:44:58 -0700

Author: rwesten
Date: Thu Sep 22 15:44:32 2011
New Revision: 1174220

URL: http://svn.apache.org/viewvc?rev=1174220&view=rev
Log:
next try to get nested lists


Modified:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.mdtext

Modified: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.mdtext?rev=1174220&r1=1174219&r2=1174220&view=diff
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.mdtext
 (original)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.mdtext
 Thu Sep 22 15:44:32 2011
@@ -13,13 +13,13 @@ The following list provides a short over
 * **Multi-lingual labels of the controlled vocabulary:** Entities are matched 
based on labels of the current language and labels without any defined 
language. e.g. English labels will not be matched against German language 
texts. Therefore it is important to have a controlled vocabulary that includes 
labels in the language of the texts you want to enhance.
 * **Natural Language Processing support:** The KeywordLinkingEngine is able to 
use [Sentence 
Detectors](http://opennlp.sourceforge.net/api/opennlp/tools/sentdetect/SentenceDetector.html),
 [POS (Part of Speech) 
taggers](http://opennlp.sourceforge.net/api/opennlp/tools/postag/POSTagger.html)
 and 
[Chunkers](http://opennlp.sourceforge.net/api/opennlp/tools/chunker/Chunker.html).
 If such components are available for a language then they are used to optimize 
the enhancement process.
   
-  * **Sentence detector:** If a sentence detector is present the memory 
footprint of the engines improves, because Tokens, POS tags and Chunks are only 
kept for the currently active sentence. If no sentence detector is available 
the entire content is treated as a single sentence.
+    * **Sentence detector:** If a sentence detector is present the memory 
footprint of the engines improves, because Tokens, POS tags and Chunks are only 
kept for the currently active sentence. If no sentence detector is available 
the entire content is treated as a single sentence.
   
-  * **Tokenizer:** A (word) 
[tokenizer](http://opennlp.sourceforge.net/api/opennlp/tools/tokenize/Tokenizer.html)
 is required for the enhancement process. If no specific tokenizer is available 
for a given language, then the [OpenNLP 
SimpleTokenizer](http://opennlp.sourceforge.net/api/opennlp/tools/tokenize/SimpleTokenizer.html)
 is used as default. How well this tokenizer works will depend on the language.
+    * **Tokenizer:** A (word) 
[tokenizer](http://opennlp.sourceforge.net/api/opennlp/tools/tokenize/Tokenizer.html)
 is required for the enhancement process. If no specific tokenizer is available 
for a given language, then the [OpenNLP 
SimpleTokenizer](http://opennlp.sourceforge.net/api/opennlp/tools/tokenize/SimpleTokenizer.html)
 is used as default. How well this tokenizer works will depend on the language.
   
-  * **POS tagger:** POS (Part-of-Speech) taggers annotate tokens with their 
type. Because of the KeywordLinkingEngine is only interested in Nouns, Foreign 
Words and Numbers, the presence of such a tagger allows to skip a lot of the 
tokens and to improve performance. However POS taggers use different sets of 
tags for different languages. Because of that it is not enough that a POS 
tagger is available for a language there MUST BE also a configuration of the 
POS tags representing Nouns.
+    * **POS tagger:** POS (Part-of-Speech) taggers annotate tokens with their 
type. Because of the KeywordLinkingEngine is only interested in Nouns, Foreign 
Words and Numbers, the presence of such a tagger allows to skip a lot of the 
tokens and to improve performance. However POS taggers use different sets of 
tags for different languages. Because of that it is not enough that a POS 
tagger is available for a language there MUST BE also a configuration of the 
POS tags representing Nouns.
   
-  * **Chunker:** There are two types of Chunkers. First the 
[Chunkers](http://opennlp.sourceforge.net/api/opennlp/tools/chunker/Chunker.html)
 as provided by OpenNLP (based on statistical models) and second a [POS tag 
based 
Chunker](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/commons/opennlp/src/main/java/org/apache/stanbol/commons/opennlp/PosTypeChunker.java)
 provided by the openNLP bundle of Stanbol. Currently the availability of a 
Chunker does not have a big influence on the performance nor the quality of the 
Enhancements.
+    * **Chunker:** There are two types of Chunkers. First the 
[Chunkers](http://opennlp.sourceforge.net/api/opennlp/tools/chunker/Chunker.html)
 as provided by OpenNLP (based on statistical models) and second a [POS tag 
based 
Chunker](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/commons/opennlp/src/main/java/org/apache/stanbol/commons/opennlp/PosTypeChunker.java)
 provided by the openNLP bundle of Stanbol. Currently the availability of a 
Chunker does not have a big influence on the performance nor the quality of the 
Enhancements.
 
 * **Configuration:** The set of languages to be annotated can be configured 
for the KeywordLinkingEngine. An empty configuration indicates that texts in 
any language should be processed. By using this configuration it is possible to 
configure different KeywordLinkingEngine instances for different languages 
(e.g. with different configurations)

svn commit: r1174220 - /incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/engines/keywordlinkingengine.mdtext

Reply via email to