Inefficient SPARQL query in
org.apache.stanbol.enhancer.jersey.resource.ContentItemResource
-------------------------------------------------------------------------------------------
Key: STANBOL-342
URL: https://issues.apache.org/jira/browse/STANBOL-342
Project: Stanbol
Issue Type: Improvement
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
The SPARQL query:
PREFIX enhancer: <http://fise.iks-project.eu/ontology/>
PREFIX dc: <http://purl.org/dc/terms/>
SELECT ? textAnnotation ?text ?entity ?entity_label ?confidence
WHERE {
?textAnnotation a enhancer:TextAnnotation .
?textAnnotation dc:type ?type } .
?textAnnotation enhancer:selected-text ?text .
OPTIONAL {
?entityAnnotation dc:relation ?textAnnotation .
?entityAnnotation a enhancer:EntityAnnotation .
?entityAnnotation enhancer:entity-reference ?entity .
?entityAnnotation enhancer:entity-label ?entity_label .
?entityAnnotation enhancer:confidence ?confidence .
}
}
ORDER BY
?text
gets very inefficient on the in-memory RDF model as returned by
ContentItem.getMetadata.
On a Content enhanced with about 150 TextAnnotations and 200 EntityAnntoations
the time to execute this query for all types supproted by the UI (Person,
Organizations, Places, Concepts and Others) was about 20 seconds while the
enhancement process required about 1 sec.
I know this part is only used for the HTTP of
"http://{stanbol-instance}/engines" and therefore does not influence the
performance of the RESTful services.
However exactly this interface is usually the first contact point of -
potential - users with Apache Stanbol therefore it is very likely that people
get a very "wrong" impression about the performance of Stanbol if they try to
parse longer texts that results in a lot of Enhancements.
Because of that I will replace the current implementation with an other one
that does not require the use of SPARQL.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira