Rupert Westenthaler created STANBOL-733: -------------------------------------------
Summary: Stanbol NLP processing Key: STANBOL-733 URL: https://issues.apache.org/jira/browse/STANBOL-733 Project: Stanbol Issue Type: New Feature Components: Enhancer Reporter: Rupert Westenthaler This issue covers the NLP processing components as discussed in http://markmail.org/message/qxusiup3mim2lhpx Goals ===== 1. provide a modular infrastructure for NLP-related things Many tasks in NLP can be computationally intensive, and there is no "one fits all" NLP approach when analysing text. Therefore, we wanted to have a NLP infrastructure that can be configured and wired together as needed for the specific use case, with several specialised modules that can build upon each other but many of which are optional. 2. provide a unified data model for representing NLP text annotations In many szenarios, it will be necessary to implement custom engines building on the results of a previous "generic" analysis of the text (e.g. POS tagging and chunking). For example, in a project we are identifying so-called "noun phrases", use a lemmatizer to build the ground form, then convert this to singular nominative form to have a gramatically correct label to use in a tag cloud. Most of this builds on generic NLP functionality, but the last step is very specific to the use case. Therefore, we wanted also to implement a generic NLP data model that allows representing text annotations attached to individual words or also to spans of words. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira