Rupert Westenthaler created STANBOL-733:
-------------------------------------------

             Summary: Stanbol NLP processing
                 Key: STANBOL-733
                 URL: https://issues.apache.org/jira/browse/STANBOL-733
             Project: Stanbol
          Issue Type: New Feature
          Components: Enhancer
            Reporter: Rupert Westenthaler


This issue covers the NLP processing components as discussed in 
http://markmail.org/message/qxusiup3mim2lhpx

Goals
=====

1. provide a modular infrastructure for NLP-related things

Many tasks in NLP can be computationally intensive, and there is no "one fits
all" NLP approach when analysing text. Therefore, we wanted to have a NLP
infrastructure that can be configured and wired together as needed for the
specific use case, with several specialised modules that can build upon each
other but many of which are optional. 

2. provide a unified data model for representing NLP text annotations

In many szenarios, it will be necessary to implement custom engines building on
the results of a previous "generic" analysis of the text (e.g. POS tagging and
chunking). For example, in a project we are identifying so-called "noun
phrases", use a lemmatizer to build the ground form, then convert this to
singular nominative form to have a gramatically correct label to use in a tag
cloud. Most of this builds on generic NLP functionality, but the last step is
very specific to the use case.

Therefore, we wanted also to implement a generic NLP data model that allows
representing text annotations attached to individual words or also to spans of
words.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to