I looked at some more queries that worked in jena 2.x but seem to hang
in 3.x. They all follow the same pattern of a complex FILTER query on
string values. Rewriting the filter conditions into subgraph patterns
solved the problem.

Is this a defect induced by algebra optimizations in 3.x? Or, is it more
proper to apply string filters in the manner you suggested, by enclosing
them in subgraph patterns close to the triples they filter?

There was one case that was a little more complex. The original query
was like:

CONSTRUCT {
?var1 :p1 false .
}
WHERE {
FILTER ((?var2 != "str1" && !strstarts(?var3,"str2")))
?var1 :p2 ?var3 ;
 :p3 ?var2 ;
 :p4 "str3" ;
 :p5 "str4" ;
 :p6 "str5" .
FILTER NOT EXISTS {
FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
"str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
= "str13")))
?var5 :p7 ?var4 ;
 :p8 ?var3 .
}
}

I initially rewrote the FILTER NOT EXISTS clause to read:

FILTER NOT EXISTS {
{FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
"str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
= "str13")))
?var5 :p7 ?var4 .}
?var5 :p8 ?var3 .
}

which still seemed to hang. Reordering the FILTER NOT EXISTS bgp to the
following solved the problem.

FILTER NOT EXISTS {
?var5 :p8 ?var3 .
{FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
"str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
= "str13"))) 
?var5 :p7 ?var4 .}
}

I have noticed other cases where order of triples and bgps makes quite a
difference in execution time, but I can't figure out any science to it.
Are there any guidelines for ordering the components of a complex query
(including UNION and OPTIONAL clauses) to optimize performance? Can you
tell anything by a static analysis of the sparql algebra?

Regards,
--Paul 



On Fri, 2016-09-16 at 08:37 -0500, Paul Tyson wrote:
> Andy,
> 
> With that rewrite, the 3.x tdbquery works as expected.
> 
> I will investigate further this weekend and send other queries that don't 
> work in 3.x.
> 
> Regards,
> --Paul
> 
> > On Sep 16, 2016, at 04:26, Andy Seaborne <a...@apache.org> wrote:
> > 
> > Paul,  If you could try the query below which mimics the effect of placing 
> > the ?var4 filter part, it will help determine if this is a filter placement 
> > issue or not.
> > 
> > The difference is that first basic graph pattern is inside a {} with the 
> > relevant part of the filter expression.
> > 
> >    Andy
> > 
> > 
> > PREFIX  :     <http://example/>
> > 
> > SELECT  *
> > WHERE
> >  { FILTER ( ( ?var3 = "str1" ) || ( ?var3 = "str2" ) )
> >    { ?var2  :p1  ?var4 ;
> >             :p2  ?var3
> >      FILTER ( ! ( ( ( ?var4 = "" ) ||
> >               ( ?var4 = "str3" ) ) ||
> >               regex(?var4, "pat1") ) )
> >    }
> >    {   { ?var1  :p3  ?var4 }
> >      UNION
> >        { ?var1  :p4  ?var4 }
> >    }
> >  }
> > 
> > 
> >    Andy
> > 
> > 
> >> On 14/09/16 13:15, Paul Tyson wrote:
> >>> On Wed, 2016-09-14 at 10:57 +0100, Andy Seaborne wrote:
> >>> Hi Paul,
> >>> 
> >>> It's difficult to tell what's going on from your report. Plain strings
> >>> are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have
> >>> related the data for running Jena 3.x.
> >> 
> >> I admit I have not studied the subtleties around string literals with
> >> and without datatype tags. None of my data loadfiles have tagged string
> >> literals, nor do my queries. Are you saying they should?
> >> 
> >>> 
> >>> On less data, does either case produce the wrong answers?
> >> 
> >> I'll produce a smaller dataset to test.
> >> 
> >>> The regex is not being pushed inwards in the same way which may be an
> >>> issue - it "all depends" on the data.
> >>> 
> >>> A smaller query exhibiting a timing difference would be very helpful.
> >>> Are all parts of the FILTER necessary for the effect?
> >> Yes, they eliminate spurious matches.
> >> 
> >>> 
> >>>    Andy
> >>> 
> >>> Unrelated:
> >>> 
> >>> {
> >>> ?var1 :p3 ?var4 .
> >>> } UNION {
> >>> ?var1 :p4 ?var4 .
> >>> }
> >>> 
> >>> can be written
> >>> 
> >>> ?var1 (:p3|:p4) ?var4
> >> Yes, but I generate these queries from RIF source, and UNION is easier
> >> for the general RIF statement "Or(x,y)". The surface syntax doesn't make
> >> any difference in the algebra, does it?
> >> 
> >> Regards,
> >> --Paul
> >> 
> >>>> On 14/09/16 02:01, Paul Tyson wrote:
> >>>> I have some queries that worked fine in jena-2.13.0 but not in
> >>>> jena-3.1.0, using the same data.
> >>>> 
> >>>> For a long time I've been running a couple dozen queries regularly over
> >>>> a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
> >>>> to jena-3.1.0, I found that 5 of these queries would not return (ran
> >>>> forever). qparse revealed that the sparql algebra is quite different in
> >>>> 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
> >>>> 
> >>>> Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
> >>>> with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
> >>>> 
> >>>> prefix : <http://example.org>
> >>>> CONSTRUCT {
> >>>> ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
> >>>> }
> >>>> WHERE {
> >>>> FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
> >>>> "str3" || regex(?var4,"pat1"))))
> >>>> ?var2 :p1 ?var4 ; :p2 ?var3 .
> >>>> {{
> >>>> ?var1 :p3 ?var4 .
> >>>> } UNION {
> >>>> ?var1 :p4 ?var4 .
> >>>> }}
> >>>> }
> >>>> 
> >>>> Jena-2.13.0 produces algebra:
> >>>> (prefix ((: <http://example.org>))
> >>>>  (sequence
> >>>>    (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
> >>>>      (sequence
> >>>>        (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
> >>>> "pat1")))
> >>>>          (bgp (triple ?var2 :p1 ?var4)))
> >>>>        (bgp (triple ?var2 :p2 ?var3))))
> >>>>    (union
> >>>>      (bgp (triple ?var1 :p3 ?var4))
> >>>>      (bgp (triple ?var1 :p4 ?var4)))))
> >>>> 
> >>>> Jena-3.1.0 produces algebra:
> >>>> (prefix ((: <http://example.org>))
> >>>>  (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
> >>>> "pat1")))
> >>>>    (disjunction
> >>>>      (assign ((?var3 "str1"))
> >>>>        (sequence
> >>>>          (bgp
> >>>>            (triple ?var2 :p1 ?var4)
> >>>>            (triple ?var2 :p2 "str1")
> >>>>          )
> >>>>          (union
> >>>>            (bgp (triple ?var1 :p3 ?var4))
> >>>>            (bgp (triple ?var1 :p4 ?var4)))))
> >>>>      (assign ((?var3 "str2"))
> >>>>        (sequence
> >>>>          (bgp
> >>>>            (triple ?var2 :p1 ?var4)
> >>>>            (triple ?var2 :p2 "str2")
> >>>>          )
> >>>>          (union
> >>>>            (bgp (triple ?var1 :p3 ?var4))
> >>>>            (bgp (triple ?var1 :p4 ?var4))))))))
> >>>> 
> >>>> Thanks for any insight or assistance into this problem.
> >>>> 
> >>>> Regards,
> >>>> --Paul
> >> 
> >> 
> 


Reply via email to