Hi Andy. That's cracked it. I was wondering about the sub-select route, but wasn't sure how to code the intersection part. I just tweaked it to return the score from the text query
Your formulation 200 OK (231 ms) That's 200 OK by me... Enjoy the holidays Mark Technology Lead, Iotic Labs +44 7973 674404 [email protected] https://www.iotic-labs.com On 23/12/15 17:03, Andy Seaborne wrote: > Hi Mark, > > Tricky. > > There isn't a good way to turn off or modify optimization for parts of a > query without affecting the whole query. Jena 3.0.1 had a combination > of changes - hash join but also stronger flattening queries into the > form you don't want for the first part. > > The best I have come up with is: > (no special flags needed) > > > SELECT ?score ?ent > WHERE { > { SELECT ?ent { ?ent spatial:nearby "ABC" . } OFFSET 0 } > { SELECT ?ent { ?ent text:query "DEF" . } OFFSET 0 } > ... rest of query ... > > } > > i.e. > > SELECT ?score ?ent ?entLabel ?lat ?long ?point ?pointType ?pointLabel > WHERE { > { SELECT ?ent { > ?ent spatial:nearby(51.507999420166016 -0.10999999940395355 > 70.8018078804016'km') . > } OFFSET 0 } > { SELECT ?ent { > (?ent ?score) text:query ('environment' 'lang:en') . > } OFFSET 0 } > > ?ent rdf:type iotic:Entity > > OPTIONAL { > ?ent rdfs:label ?entLabel . > FILTER langMatches( lang(?entLabel), 'en' ) . > } > > OPTIONAL {?ent geo:lat ?lat . ?ent geo:long ?long} > ?ent iotic:Advertises ?point . > ?point rdf:type iotic:Point . > ?point iotic:PointType ?pointType . > > OPTIONAL { > ?point rdfs:label ?pointLabel . > FILTER langMatches( lang(?pointLabel), 'en' ) . > } > } > > > On 23/12/15 11:03, Mark Wharton wrote: >> Hi Andy. >> >> More experiments this morning. I originally only send you a small part >> of a larger query just to expose the problem in its simplest form. And >> your switches work well in that case (i.e. first formulation below >> *with* the comments.) >> >> But... There's a problem when using the switches in that the rest of the >> query wants to get the rdfs:label and various other properties. This >> destroys the performance gains. >> >> I've tried "yours" and "mine" with and without the switches and then the >> separate parts on their own to see how that goes. >> >> 1) "yours" >> ========== >> This formulation (with the switches and comments in place) - 384 ms >> >> SELECT ?score ?ent ?entLabel ?lat ?long ?point ?pointType ?pointLabel >> WHERE { >> { ?ent spatial:nearby(51.507999420166016 -0.10999999940395355 >> 70.8018078804016'km') } >> { (?ent ?score) text:query ('environment' 'lang:en') .FILTER EXISTS >> {?ent rdf:type iotic:Entity} } >> >> # OPTIONAL { >> # ?ent rdfs:label ?entLabel . >> # FILTER langMatches( lang(?entLabel), 'en' ) . >> # } >> # >> # OPTIONAL {?ent geo:lat ?lat . ?ent geo:long ?long} >> # ?ent iotic:Advertises ?point . >> # ?point rdf:type iotic:Point . >> # ?point iotic:PointType ?pointType . >> # >> # OPTIONAL { >> # ?point rdfs:label ?pointLabel . >> # FILTER langMatches( lang(?pointLabel), 'en' ) . >> # } >> >> } >> >> Uncomment the lines and the performance drops to - 7.165 ms >> >> 2) "mine" >> ========= >> The below formulation with the switches in place 11.221 secs >> The below without the switches. 5.371 secs >> >> SELECT ?score ?ent ?entLabel ?lat ?long ?point ?pointType ?pointLabel >> WHERE { >> ?ent spatial:nearby(51.507999420166016 -0.10999999940395355 >> 70.8018078804016'km') . >> (?ent ?score) text:query ('environment' 'lang:en') .FILTER EXISTS >> {?ent rdf:type iotic:Entity} . >> >> OPTIONAL { >> ?ent rdfs:label ?entLabel . >> FILTER langMatches( lang(?entLabel), 'en' ) . >> } >> >> OPTIONAL {?ent geo:lat ?lat . ?ent geo:long ?long} >> ?ent iotic:Advertises ?point . >> ?point rdf:type iotic:Point . >> ?point iotic:PointType ?pointType . >> >> OPTIONAL { >> ?point rdfs:label ?pointLabel . >> FILTER langMatches( lang(?pointLabel), 'en' ) . >> } >> >> } >> >> 3) Separately >> ============== >> Completely on their own: >> ======================== >> i.e. just the ?ent spatial:nearby line >> the spatial query on its own takes 50 ms >> i.e just the text:query line >> and the text on its own takes 258 ms >> >> With the OPTIONAL {} and other properties >> ========================================= >> Spatial and other properties 135 ms >> Text and other properties 854 ms >> >> Again, repeated thanks for you help. >> >> Mark >> >> Technology Lead, Iotic Labs >> [email protected] >> https://www.iotic-labs.com >> >> On 22/12/15 17:22, Andy Seaborne wrote: >>> Mark, >>> >>> Thanks for the experiment results. >>> >>> On 22/12/15 15:47, Mark Wharton wrote: >>>> Query below run without Andy's switches. >>>> INFO [5] 200 OK (4.985 s) >>>> >>>> Query below run with Andy's switches. >>>> INFO [1] 200 OK (840 ms) >>>> >>>> Them's some magic switches. Thanks, Andy. >>>> >>>> Do they have any impact (negative or positive) on any other SPARQL >>>> operations? I'm only curious as you've solved our main problem in that >>>> our "search" query was very slow. There's nowhere else that uses the >>>> text and spatial indexes for retrieval. >>> >>> This depends on any internal change in the latest release (Jena 3.0.1, >>> Fuseki 2.3.1). Prior to that it will not make the same difference. >>> Specially, unoptimized joins are now hash joins. >>> >>> But that is not a good choice for the "?ent rdf:type iotic:Entity" >>> triple pattern. The system can't distinguish different cases involving >>> external indexes as it knows not very much about the index details. >>> >>> Adding >>> >>> FILTER EXISTS { ?ext rdf:type iotic:Entity } >>> >>> might work because the triple pattern is really a check, not a match >>> setting a variable. >>> >>> A plain "?ent rdf:type iotic:Entity" will retrieve all things of that >>> class regardless of spatial and text query when those optimization >>> are off. >>> >>> Andy >>> >>>> >>>> Many thanks for this help so close to the holiday season. Happy >>>> holidays to you all at Jena - keep up the good work. >>>> >>>> Mark >>>> >>>> >>>> Technology Lead, Iotic Labs >>>> +44 7973 674404 >>>> [email protected] >>>> https://www.iotic-labs.com >>>> >>>> On 22/12/15 11:49, Andy Seaborne wrote: >>>>> Mark - here is another way. >>>>> >>>>> This query: >>>>> >>>>> SELECT ?score ?ent >>>>> WHERE { >>>>> { ?ent spatial:nearby ( .... ) } >>>>> { ?ent text:query ( ..... ) } >>>>> # No ?ent rdf:type iotic:Entity . >>>>> # This focuses the query on the presenting issue. >>>>> } >>>>> >>>>> and then run Fuseki with the following flags: >>>>> >>>>> --set arq:optIndexJoinStrategy=false --set arq:optMergeBGPs=false >>>>> >>>>> for however you are running the server. >>>>> >>>>> You need both --set >>>>> >>>>> The service script will not do this very easily - if environment >>>>> variable FUSEKI_ARGS is set it might do. Untested. >>>>> >>>>> It is easier to run the server standalone: >>>>> >>>>> (Linux, Mac) >>>>> >>>>> The "fuseki-server" script should pass these in: >>>>> >>>>> fuseki-server \ >>>>> --set arq:optIndexJoinStrategy=false --set >>>>> arq:optMergeBGPs=false \ >>>>> .. other args .. >>>>> >>>>> (Windows or any platform) >>>>> >>>>> You can call the server java code directly: all one line: >>>>> >>>>> >>>>> java -Xmx1200M -jar fuseki-server.jar --set >>>>> arq:optIndexJoinStrategy=false --set arq:optMergeBGPs=false .. other >>>>> args .. >>>>> >>>>> you'll need to put the full path name of fuseki-server.jar >>>>> >>>>> Sorry - I don't have your setup to test this fully. I did make sure >>>>> that >>>>> the reworked query does lead to an execution plan that is different >>>>> and >>>>> should yield some information about the situation. >>>>> >>>>> Andy >>>>> >>>>> On 22/12/15 09:50, Andy Seaborne wrote: >>>>>> On 22/12/15 07:06, Mark Wharton wrote: >>>>>>> Ah, wheels within wheels. >>>>>>> >>>>>>> The formulation with the filter in it is fine, except that if you >>>>>>> want >>>>>>> to search for more than one word or you match in label and comment >>>>>>> then >>>>>>> the UNION formulation returns you duplicate rows. This isn't a >>>>>>> problem >>>>>>> with the Lucene search which is why (I now remember) I used it in >>>>>>> the >>>>>>> first place. >>>>>>> >>>>>>> I'm not sure what version of jena I'm using - I just use the fuseki >>>>>>> release at 2.3.0. Is there a way to find out? >>>>>> >>>>>> 3.0.0 >>>>>> >>>>>> Many of the java commands support --version and the fuseki- server >>>>>> jar >>>>>> is an all-in-one jar: >>>>>> >>>>>> java -cp <YourInstall>/fuseki-server.jar arq.sparql --version >>>>>> >>>>>>> What's the status on the JENA-999 and JENA-1093 issues? I see >>>>>>> there's >>>>>>> been some activity on 999 in the last few days. Andy Seaborne's last >>>>>>> comment seems encouraging. >>>>>>> >>>>>>> I don't want to adopt a single version as I'll be stuck forever >>>>>>> patching >>>>>>> back and forward and it will break eventually. >>>>>>> >>>>>>> Many thanks for your continued help. >>>>>> >>>>>> JENA-999 may sort of help but I'm not that positive because each ?ent >>>>>> from the first part will be different going into the second part. It >>>>>> looks to me as if it is the overhead of going out to Lucene. (This is >>>>>> Lucene right? not Solr?) >>>>>> >>>>>> The ideal is some super compilation of the text:query and spatial >>>>>> query >>>>>> into one big Lucene query. >>>>>> >>>>>> What would also be good, which is stop the general optimizer (this is >>>>>> nothing to do with TDB) using an index join. Except that is the >>>>>> better >>>>>> choice for the rdf:type. This is what the addition {} were trying >>>>>> for >>>>>> except the optimizer outsmarted >>>>>> >>>>>> SELECT ?score ?ent >>>>>> WHERE { >>>>>> ?ent spatial:nearby( ...) . >>>>>> (?ent ?score) text:query (...) . >>>>>> ?ent rdf:type iotic:Entity . >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> Mark - can you ask the query from Java? If so, >>>>>> >>>>>> Add "Optimize.noOptimizer(); " before executing the query. I can't >>>>>> see >>>>>> a way to do that from setting the environment for Fuseki. >>>>>> >>>>>> Or (the effect on time of this is version specific and whether it >>>>>> does >>>>>> anything useful is a big "maybe") you could try this: >>>>>> >>>>>> SELECT ?score ?ent >>>>>> WHERE { >>>>>> { OPTIONAL { ?ent spatial:nearby "ABC" . }} >>>>>> { OPTIONAL { ?ent text:query "DEF" } } >>>>>> } >>>>>> >>>>>> Andy >>>>>> >>>>> >>>>> >>> >
