Skip to site navigation (Press enter)

[jira] [Created] (STANBOL-613) Define a standard way on how to obtain the extracted language

Rupert Westenthaler (JIRA) Tue, 15 May 2012 02:31:52 -0700

Issue Type:	Sub-task
Assignee:	Rupert Westenthaler
Components:	Enhancer
Created:	15/May/12 09:29
Description:	With the addition of the CELI Langauge Identification Engine there are now two different engines that do support the same feature. However currently Engines that do consume the detected language are "hard coded" to the LangId Engine (enhancer/engines/langid). Something that need to be changed to allow the adoption of alternatives - like the CELI based implementation. The suggestion is to use the following Pattern to extract the language (1) via Annotations: ?x rdf:type fise:TextAnnotation . ?x dc:language ?language . OPTIONAL { ?x dc:created ?engine } OPTIONAL { ?x fise:confidence ?confidence } (2) via ContentItem metadata ?ci dc:language ?language (2) is a fallback if (1) delivers no results. Methods that * extract the language (with the highest confidence) - including fallback to (2) * extract all languages (sorted by confidence) - including fallback to (2) * extract all TextAnnotations with dc:language values are added to the EnhancementEngineHelper utility of the enhancer.servicesapi module
Fix Versions:	0.10.0-incubating
Project:	Stanbol
Priority:	Minor
Reporter:	Rupert Westenthaler

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira