Inefficient SPARQL query in 
org.apache.stanbol.enhancer.jersey.resource.ContentItemResource
-------------------------------------------------------------------------------------------

                 Key: STANBOL-342
                 URL: https://issues.apache.org/jira/browse/STANBOL-342
             Project: Stanbol
          Issue Type: Improvement
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


The SPARQL query:

    PREFIX enhancer: <http://fise.iks-project.eu/ontology/>
    PREFIX dc:   <http://purl.org/dc/terms/>
    SELECT ? textAnnotation ?text ?entity ?entity_label ?confidence 
    WHERE {
        ?textAnnotation a enhancer:TextAnnotation .
        ?textAnnotation dc:type ?type } .
        ?textAnnotation enhancer:selected-text ?text .
        OPTIONAL {
           ?entityAnnotation dc:relation ?textAnnotation .
           ?entityAnnotation a enhancer:EntityAnnotation .
           ?entityAnnotation enhancer:entity-reference ?entity .
           ?entityAnnotation enhancer:entity-label ?entity_label .
           ?entityAnnotation enhancer:confidence ?confidence . 
        }
    }
    ORDER BY
         ?text

gets very inefficient on the in-memory RDF model as returned by 
ContentItem.getMetadata.

On a Content enhanced with about 150 TextAnnotations and 200 EntityAnntoations 
the time to execute this query for all types supproted by the UI (Person, 
Organizations, Places, Concepts and Others) was about 20 seconds while the 
enhancement process required about 1 sec.

I know this part is only used for the HTTP of 
"http://{stanbol-instance}/engines"; and therefore does not influence the 
performance of the RESTful services. 
However exactly this interface is usually the first contact point of - 
potential - users with Apache Stanbol therefore it is very likely that people 
get a very "wrong" impression about the performance of Stanbol if they try to 
parse longer texts that results in a lot of Enhancements.

Because of that I will replace the current implementation with an other one 
that does not require the use of SPARQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to