Hi Omar,

Thanks for the detail. I've created an example locally.

The change is JENA-1943 (Increase Precision of Decimal Calculation Results) and partly involved is JENA-1984 (XSD casting fixes). The division isn't exact in xsd:decimal (factor of 3 in 7200) so the precision was causing an exception.

Using xsd:double, and no further casting is the right way to achieve the task ... except I now don't understand why it costs more than the xsd:decimal way (using jena development after fixes)

Recorded as: JENA-2116

    Andy

On 08/06/2021 02:34, omar.k...@wellsfargo.com wrote:
Hi Andy,

The datatype of ?pix is normally an integer.

I also found out that division was twice as slow for integers here in f17 as f14/16, the below filter may not make a lot of sense. That said, I was using it to see if it had to do with fundamental arithmetic operations.

FILTER (((?pix)/(?pix)) = 1002 )

Takes ~ 2-3 seconds in f16

Takes ~ 6 seconds in f17

It led me to try what follows and could be telling. I cast the numerator to double and it was much faster for the set and yielded the same results of when I ran with an integer.

This takes ~ 3 seconds in BOTH f16/f17

FILTER (((?pix)-FLOOR((xsd:double(?pix))/7200)*7200) = (1002) )

F16

real    0m2.794s

user    0m0.005s

sys     0m0.004s

F17

real    0m2.783s

user    0m0.006s

sys     0m0.004s

This takes ~ 20 s in f17  ~ 6 seconds in F16

FILTER (((?pix)-FLOOR(((?pix))/7200)*7200) = (1002) )

F16

real    0m5.774s

user    0m0.006s

sys     0m0.004s

F17

real    0m20.020s

user    0m0.006s

sys     0m0.004s

What follows are the timings on the query you asked for:

For this query

select (count(*) AS ?C)

WHERE {

 ?record <https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix .

FILTER (xsd:integer(xsd:integer(?pix)-FLOOR(xsd:integer(?pix)/7200)*7200) = xsd:integer(1002) )

}

F16

bash-4.2$ time curl -k --data-urlencode qu...@canary5.rq -H 'Accept:application/n-triples' http://localhost:3030/kg/sparql

<?xml version="1.0"?>

<sparql xmlns="http://www.w3.org/2005/sparql-results#";>

   <head>

     <variable name="C"/>

   </head>

   <results>

     <result>

       <binding name="C">

        <literal datatype="http://www.w3.org/2001/XMLSchema#integer";>142</literal>

       </binding>

     </result>

   </results>

</sparql>

real    0m7.668s

user    0m0.005s

sys     0m0.004s

F17

bash-4.2$ time curl -k --data-urlencode qu...@canary5.rq -H 'Accept:application/n-triples' http://localhost:3030/kg/sparql

<?xml version="1.0"?>

<sparql xmlns="http://www.w3.org/2005/sparql-results#";>

   <head>

     <variable name="C"/>

   </head>

   <results>

     <result>

       <binding name="C">

        <literal datatype="http://www.w3.org/2001/XMLSchema#integer";>142</literal>

      </binding>

     </result>

   </results>

</sparql>

real    0m23.995s

user    0m0.004s

sys     0m0.005s

F16

bash-4.2$ time curl -k --data-urlencode qu...@canary6.rq -H 'Accept:application/n-triples' http://localhost:3030/kg/sparql

<?xml version="1.0"?>

<sparql xmlns="http://www.w3.org/2005/sparql-results#";>

   <head>

     <variable name="C"/>

   </head>

   <results>

     <result>

       <binding name="C">

        <literal datatype="http://www.w3.org/2001/XMLSchema#integer";>142</literal>

       </binding>

     </result>

   </results>

</sparql>

real    0m6.347s

user    0m0.006s

sys     0m0.003s

F17

bash-4.2$ time curl -k --data-urlencode qu...@canary6.rq -H 'Accept:application/n-triples' http://localhost:3030/kg/sparql

<?xml version="1.0"?>

<sparql xmlns="http://www.w3.org/2005/sparql-results#";>

   <head>

     <variable name="C"/>

   </head>

   <results>

     <result>

       <binding name="C">

        <literal datatype="http://www.w3.org/2001/XMLSchema#integer";>142</literal>

       </binding>

     </result>

   </results>

</sparql>

real    0m20.952s

user    0m0.004s

sys     0m0.005s

stats about the data:

select (count(*) AS ?C)

WHERE {

 ?record <https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix .

}

time curl -k --data-urlencode qu...@canary7.rq -H 'Accept:application/n-triples' http://localhost:3030/kg/sparql

<?xml version="1.0"?>

<sparql xmlns="http://www.w3.org/2005/sparql-results#";>

   <head>

     <variable name="C"/>

   </head>

   <results>

     <result>

       <binding name="C">

        <literal datatype="http://www.w3.org/2001/XMLSchema#integer";>1000000</literal>

       </binding>

     </result>

   </results>

</sparql>

real    0m0.546s

user    0m0.004s

sys     0m0.004s

select (count(distinct ?record) AS ?R)

WHERE {

 ?record <https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID> ?pix .

}

time curl -k --data-urlencode qu...@canary8.rq -H 'Accept:application/n-triples' http://localhost:3030/kg/sparql  > canary8.out

  % Total    % Received % Xferd  Average Speed   Time    Time     Time Current

                                 Dload  Upload   Total   Spent    Left Speed

100  104k    0   313  100  104k      0    228  0:07:48  0:07:47 0:00:01     0

real    7m47.731s

user    0m0.034s

sys     0m0.041s

<?xml version="1.0"?>

<sparql xmlns="http://www.w3.org/2005/sparql-results#";>

   <head>

     <variable name="R"/>

   </head>

   <results>

     <result>

       <binding name="R">

        <literal datatype="http://www.w3.org/2001/XMLSchema#integer";>1000000</literal>

       </binding>

     </result>

   </results>

</sparql>

Best,
Omar

*From:*Andy Seaborne <a...@apache.org>
*Sent:* Friday, June 04, 2021 8:03 AM
*To:* users@jena.apache.org
*Subject:* Re: possible SPARQL performance regression Fuseki 3.14/3.16 -> 3.17

Hi Omar,

Thank for the detailed report.

It might be the casting -JENA-1984  
<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/JENA-1984__;!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFVRlY2LI$>
  - which was a change

3.16.0->3.17.0

Could you time these queries please on a fast and on a slow system?

It is checking it is the filter.

select (count(*) AS ?C)

WHERE {

  ?record<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID>  
<https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
  ?pix .

  FILTER (xsd:integer(xsd:integer(?pix)-FLOOR(xsd:integer(?pix)/7200)*7200) = 
xsd:integer(1002) )

}

and (less casting):

select (count(*) AS ?C)

WHERE {

  ?record<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID>  
<https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
  ?pix .

  FILTER (xsd:integer(?pix)-FLOOR(xsd:integer(?pix)/7200)*7200 = 1002 )

}

It would be useful to know some count stats about the data:

select (count(*) AS ?C)

WHERE {

  ?record<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID>  
<https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
  ?pix .

}

select (count(distinct ?record) AS ?R)

WHERE {

  ?record<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID>  
<https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
  ?pix .

}

and what is the datatype of ?pix normally?

     Andy

On 04/06/2021 04:09, omar.k...@wellsfargo.com.INVALID <mailto:omar.k...@wellsfargo.com.INVALID> wrote:

    We have observed some surprising performance behavior with different 
versions of fuseki.

    The versions used in this example are 3.14.0 (referred to as "f14" henceforth), 3.16.0 
("f16" henceforth), and 3.17.0 ("f17" henceforth)  .

    We have a large database in TDB1 (built using Jena TDB version 3.14.0).  
There are roughly 140 million triples in the dataset.

    We discovered that many of our queries take much longer using f17 than 
using f14 or f16; that's right, the newer fuseki is much slower than the older 
one.

    We tested more with f14, but noticed that perhaps the change in performance 
wasn't realized till after f16 after lightly testing.

    We isolated a simple query that exhibits this behavior; this query reliable 
runs 4x slower in f17 than f14.

    select *

    WHERE {  ?record<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID>  
<https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
  ?pix .

    FILTER (xsd:integer(xsd:integer(?pix)-FLOOR(xsd:integer(?pix)/7200)*7200) = 
xsd:integer(1002) )

    ?record<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID>  
<https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
  ?CONSUMERID .

    }

    The modular arithmetic calculation in the filter is used to limit the 
number of records processed at any one time, and is an essential part of the 
query.

    Here are some sample run stats.

    Using f14:

    time curl -k --data-urlencodequ...@canary.rq  <mailto:qu...@canary.rq>   
http://localhost:3030/kg/sparql  
<https://urldefense.com/v3/__http:/localhost:3030/kg/sparql__;!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF-cMtJ6c$>
   > /dev/null

    real    0m7.159s

    user    0m0.008s

    sys     0m0.002s

    Using f17:

    time curl -k --data-urlencodequ...@canary.rq  <mailto:qu...@canary.rq>   
http://localhost:3030/kg/sparql  
<https://urldefense.com/v3/__http:/localhost:3030/kg/sparql__;!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF-cMtJ6c$>
   > /dev/null

    real    0m23.192s

    user    0m0.004s

    sys     0m0.009s

    We ran this test several times, and got essentially the same result (within 
about 5% on each end).

    We also ran this with JENA_HOME and JENAROOT set to a 3.15 Jena 
installation and a 3.17 Jena installation when we started fuseki; this made no 
difference in the timing.

    We asked tdbquery to explain the query; the result was not very 
illuminating:

    16:54:06 INFO  exec       :: ALGEBRA

       (filter (= (<http://www.w3.org/2001/XMLSchema#integer>  
<https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
  (- (<http://www.w3.org/2001/XMLSchema#integer>  
<https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
  ?pix) (* (floor (/ (<http://www.w3.org/2001/XMLSchema#integer>  
<https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
  ?pix) 7200)) 7200))) (<http://www.w3.org/2001/XMLSchema#integer>  
<https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
  1002\

    ))

         (quadpattern

           (quad <urn:x-arq:DefaultGraphNode%3E 
?record<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID>  
<https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
  ?pix)

           (quad <urn:x-arq:DefaultGraphNode%3E 
?record<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID>  
<https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
  ?CONSUMERID)

         ))

    16:54:06 INFO  exec       :: TDB

       (filter (= (<http://www.w3.org/2001/XMLSchema#integer>  
<https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
  (- (<http://www.w3.org/2001/XMLSchema#integer>  
<https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
  ?pix) (* (floor (/ (<http://www.w3.org/2001/XMLSchema#integer>  
<https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
  ?pix) 7200)) 7200))) (<http://www.w3.org/2001/XMLSchema#integer>  
<https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema*integer__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvF2e1vfJA$>
  1002\

    ))

         (quadpattern

           (quad <urn:x-arq:DefaultGraphNode%3E 
?record<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID>  
<https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
  ?pix)

           (quad <urn:x-arq:DefaultGraphNode%3E 
?record<https://data.morelyacmewidgets.com/mappings/WIDGET#CONSUMERID>  
<https://urldefense.com/v3/__https:/data.morelyacmewidgets.com/mappings/WIDGET*CONSUMERID__;Iw!!F9svGWnIaVPGSwU!7S3HzuAWctDmhv6KWMcNsYoSva_cSekBT2FUSSKtPvYu6YKX95pawAjKajvFGIZWxbk$>
  ?CONSUMERID)

         ))

    Any suggestions on how we could get the same level of performance as we 
were with f14(f16) in f17?

    Kind regards,

    Omar

Reply via email to