Document Analyzer not showing PersonTitle when running with xml tagged source
-----------------------------------------------------------------------------

                 Key: UIMA-958
                 URL: https://issues.apache.org/jira/browse/UIMA-958
             Project: UIMA
          Issue Type: Bug
          Components: Tools
            Reporter: Marshall Schor
            Assignee: Marshall Schor
            Priority: Minor
             Fix For: 2.2.2


Running the document analyzer with an input source of xml docs, and specifying 
the xml tag (of TEXT) causes the personTitle annotator to not show results.  
This is traced to it being given a language of x-unspecified, even though the 
collection reader used set the document language to "en".  This is traced to 
the insertion of the xmldetagger into the aggregate, which passed a cas view of 
"plain text" to the personTitle annotator.  That component did not copy the 
language specifier from the input cas view.  In fact, the input cas view was 
"xmlDocument" - and that view also didn't have the language.  The collection 
reader put the language only into the "_InitialView".  Fix by having the 
XmlDetagger copy the initial view's language into the resulting plain text view 
it creates for downstream annotators to work on.

Note that there are two copies of this class - fix both of them (One in tools, 
other in examples).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to