FUSEKI Concurrent runs issues

Ewa Szwed Wed, 29 Jan 2014 05:12:15 -0800

Hi,

Since this group is so responsive I would like to sick advise in another
field:



Area: concurrent calls to Fuseki:


I am performing concurrent SPARQL queries against freebase data using
Fuseki and have noticed that for some queries running them in parallel
versus in series results in a big difference in running time, whereas for
others the difference in time is minimal or non-existent.


For example my first query (notice new FILTER placement that improves
performance a lot for me!):


 prefix fb: <http://rdf.freebase.com/ns/> <http://rdf.freebase.com/ns/>

prefix fn: <http://www.w3.org/2005/xpath-functions#>
<http://www.w3.org/2005/xpath-functions>

prefix xsd: <http://www.w3.org/2001/XMLSchema#>
<http://www.w3.org/2001/XMLSchema>

select ?entity ?mID ?height ?wikipedia_url

where

{

    {

         ?mID_raw fb:type.object.type fb:people.person .

         ?mID_raw fb:type.object.name ?entity .

         ?mID_raw fb:people.person.height_meters ?height_in_meters .

         ?mID_raw fb:common.topic.topic_equivalent_webpage ?wikipedia_url .

         FILTER (lang(?entity) = "en" && regex (str(?wikipedia_url),
"en.wikipedia", "i") && !regex (str(?wikipedia_url), "curid=", "i")) .

    }

    BIND(REPLACE(str(?mID_raw), "http://rdf.freebase.com/ns/";
<http://rdf.freebase.com/ns/>, "") as ?mID)

    BIND(round(xsd:float(?height_in_meters)* xsd:float("100"))/
xsd:float("100") as ?height_rounded)

    BIND(xsd:float(?height_in_meters)* xsd:float("3.2808") AS ?height_in_feet)

    BIND(str(?height_in_feet) AS ?feet_str_value)

    BIND(str(floor(xsd:decimal(?feet_str_value))) AS ?feet_final)

    BIND(round(xsd:float(?height_in_feet -
floor(xsd:decimal(?feet_str_value))) * 12) AS ?inches)

    BIND(str(floor(xsd:decimal(str(?inches)))) as ?inches_final)

    BIND(fn:concat(?feet_final, "' ",?inches_final,"\"
(",?height_rounded, " m)" ) AS ?height)

}


Has the following runtime for a single query: 2 mins, 44 seconds

and for 5 concurrent queries: 24 mins, 27 seconds

Whereas for our second query:


  prefix fb: <http://rdf.freebase.com/ns/> <http://rdf.freebase.com/ns/>

 prefix fn: <http://www.w3.org/2005/xpath-functions#>
<http://www.w3.org/2005/xpath-functions>

 select ?entity ?mID ?age_at_death ?wikipedia_url

 where

{

   {

        ?mID_raw fb:type.object.type fb:people.person .

        ?mID_raw fb:type.object.type fb:people.deceased_person .

        ?mID_raw fb:type.object.name ?entity .

        ?mID_raw fb:people.deceased_person.date_of_death ?date_of_death .

        ?mID_raw fb:people.person.date_of_birth ?date_of_birth .

        ?mID_raw fb:common.topic.topic_equivalent_webpage ?wikipedia_url .

        FILTER (lang(?entity) = "en" && regex (str(?wikipedia_url),
"en.wikipedia", "i") && !regex (str(?wikipedia_url), "curid=", "i")).

   }

   BIND(REPLACE(str(?mID_raw), "http://rdf.freebase.com/ns/";
<http://rdf.freebase.com/ns/>, "") as ?mID)

   BIND(fn:year-from-dateTime(?date_of_birth) AS ?year_of_birth)

   BIND(fn:year-from-dateTime(?date_of_death) AS ?year_of_death)

   BIND(str(floor(fn:days-from-duration(?date_of_death -
?date_of_birth) / 365)) as ?age)

   BIND(fn:concat(?age, " (", ?year_of_birth, "-", ?year_of_death, ")"
) AS ?age_at_death)

}


Has the following runtime for  a single query:  5 mins, 35 seconds

Average for 5 concurrent queries: 5 mins, 35 seconds

Does anybody have any insights why we are seeing such different behavior
between the two queries when we run them concurrently?

What our expectations should be when we run concurrent queries against
Fuseki?

I would guess that the time should be more or less the same no matter the
load but if this is the expectation in general why we see such a big
difference for first query?


Also, for the second query above when executing this query we see 100s of
lines similar to the following being printed to the log:

05:39:35 WARN  NodeValue            :: Datatype format exception:
"2008-05-16T09"^^xsd:dateTime

We know that this problems originates with the import - we got a number of
WARNs while importing the data using tdbloader.

When we remove the bindings we do not see these Warnings in log and the
query runs a lot faster. Any ideas how to overcome this?

FUSEKI Concurrent runs issues

Reply via email to