Re: Query with spatial and text searches.

Mark Wharton Wed, 23 Dec 2015 21:39:07 -0800

Hi Andy.

That's cracked it.  I was wondering about the sub-select route, but
wasn't sure how to code the intersection part.  I just tweaked it to
return the score from the text query


Your formulation
200 OK (231 ms)

That's 200 OK by me...

Enjoy the holidays

Mark

Technology Lead, Iotic Labs
+44 7973 674404
[email protected]
https://www.iotic-labs.com

On 23/12/15 17:03, Andy Seaborne wrote:
> Hi Mark,
> 
> Tricky.
> 
> There isn't a good way to turn off or modify optimization for parts of a
> query without affecting the whole query.  Jena 3.0.1 had a combination
> of changes - hash join but also stronger flattening queries into the
> form you don't want for the first part.
> 
> The best I have come up with is:
> (no special flags needed)
> 
> 
> SELECT ?score ?ent
> WHERE {
>   { SELECT ?ent { ?ent spatial:nearby "ABC" . } OFFSET 0 }
>   { SELECT ?ent { ?ent  text:query "DEF" . }  OFFSET 0 }
>    ... rest of query ...
> 
>   }
> 
> i.e.
> 
> SELECT  ?score ?ent ?entLabel ?lat ?long ?point ?pointType ?pointLabel
> WHERE {
>     { SELECT ?ent {
>         ?ent spatial:nearby(51.507999420166016 -0.10999999940395355
> 70.8018078804016'km') .
>          } OFFSET 0 }
>     { SELECT ?ent {
>         (?ent ?score) text:query ('environment' 'lang:en') .
>         } OFFSET 0 }
> 
>     ?ent rdf:type iotic:Entity
> 
> OPTIONAL {
>     ?ent rdfs:label ?entLabel .
>     FILTER langMatches( lang(?entLabel), 'en' ) .
>     }
> 
>     OPTIONAL {?ent geo:lat ?lat . ?ent geo:long ?long}
>     ?ent iotic:Advertises ?point .
>     ?point rdf:type iotic:Point .
>     ?point iotic:PointType ?pointType .
> 
> OPTIONAL {
>     ?point rdfs:label ?pointLabel .
>     FILTER langMatches( lang(?pointLabel), 'en' ) .
>     }
> }
> 
> 
> On 23/12/15 11:03, Mark Wharton wrote:
>> Hi Andy.
>>
>> More experiments this morning.  I originally only send you a small part
>> of a larger query just to expose the problem in its simplest form.  And
>> your switches work well in that case (i.e. first formulation below
>> *with* the comments.)
>>
>> But... There's a problem when using the switches in that the rest of the
>> query wants to get the rdfs:label and various other properties.  This
>> destroys the performance gains.
>>
>> I've tried "yours" and "mine" with and without the switches and then the
>> separate parts on their own to see how that goes.
>>
>> 1) "yours"
>> ==========
>> This formulation (with the switches and comments in place) - 384 ms
>>
>> SELECT  ?score ?ent ?entLabel ?lat ?long ?point ?pointType ?pointLabel
>> WHERE {
>>     { ?ent spatial:nearby(51.507999420166016 -0.10999999940395355
>> 70.8018078804016'km') }
>>     { (?ent ?score) text:query ('environment' 'lang:en') .FILTER EXISTS
>> {?ent rdf:type iotic:Entity} }
>>
>> #    OPTIONAL {
>> #        ?ent rdfs:label ?entLabel .
>> #        FILTER langMatches( lang(?entLabel), 'en' ) .
>> #        }
>> #
>> #    OPTIONAL {?ent geo:lat ?lat . ?ent geo:long ?long}
>> #    ?ent iotic:Advertises ?point .
>> #    ?point rdf:type iotic:Point .
>> #    ?point iotic:PointType ?pointType .
>> #
>> #    OPTIONAL {
>> #       ?point rdfs:label ?pointLabel .
>> #       FILTER langMatches( lang(?pointLabel), 'en' ) .
>> #       }
>>
>> }
>>
>> Uncomment the lines and the performance drops to - 7.165 ms
>>
>> 2) "mine"
>> =========
>> The below formulation with the switches in place 11.221 secs
>> The below without the switches. 5.371 secs
>>
>> SELECT  ?score ?ent ?entLabel ?lat ?long ?point ?pointType ?pointLabel
>> WHERE {
>>      ?ent spatial:nearby(51.507999420166016 -0.10999999940395355
>> 70.8018078804016'km') .
>>      (?ent ?score) text:query ('environment' 'lang:en') .FILTER EXISTS
>> {?ent rdf:type iotic:Entity}  .
>>
>> OPTIONAL {
>>      ?ent rdfs:label ?entLabel .
>>      FILTER langMatches( lang(?entLabel), 'en' ) .
>>      }
>>
>>      OPTIONAL {?ent geo:lat ?lat . ?ent geo:long ?long}
>>      ?ent iotic:Advertises ?point .
>>      ?point rdf:type iotic:Point .
>>      ?point iotic:PointType ?pointType .
>>
>> OPTIONAL {
>>      ?point rdfs:label ?pointLabel .
>>      FILTER langMatches( lang(?pointLabel), 'en' ) .
>>      }
>>
>> }
>>
>> 3) Separately
>> ==============
>> Completely on their own:
>> ========================
>> i.e. just the ?ent spatial:nearby line
>> the spatial query on its own takes 50 ms
>> i.e just the text:query line
>> and the text on its own takes 258 ms
>>
>> With the OPTIONAL {} and other properties
>> =========================================
>> Spatial and other properties 135 ms
>> Text and other properties 854 ms
>>
>> Again, repeated thanks for you help.
>>
>> Mark
>>
>> Technology Lead, Iotic Labs
>> [email protected]
>> https://www.iotic-labs.com
>>
>> On 22/12/15 17:22, Andy Seaborne wrote:
>>> Mark,
>>>
>>> Thanks for the experiment results.
>>>
>>> On 22/12/15 15:47, Mark Wharton wrote:
>>>> Query below run without Andy's switches.
>>>>    INFO  [5] 200 OK (4.985 s)
>>>>
>>>> Query below run with Andy's switches.
>>>>    INFO  [1] 200 OK (840 ms)
>>>>
>>>> Them's some magic switches.  Thanks, Andy.
>>>>
>>>> Do they have any impact (negative or positive) on any other SPARQL
>>>> operations?  I'm only curious as you've solved our main problem in that
>>>> our "search" query was very slow.  There's nowhere else that uses the
>>>> text and spatial indexes for retrieval.
>>>
>>> This depends on any internal change in the latest release (Jena 3.0.1,
>>> Fuseki 2.3.1). Prior to that it will not make the same difference.
>>> Specially, unoptimized joins are now hash joins.
>>>
>>> But that is not a good choice for the "?ent rdf:type iotic:Entity"
>>> triple pattern.  The system can't distinguish different cases involving
>>> external indexes as it knows not very much about the index details.
>>>
>>> Adding
>>>
>>> FILTER EXISTS { ?ext rdf:type iotic:Entity }
>>>
>>> might work because the triple pattern is really a check, not a match
>>> setting a variable.
>>>
>>> A plain "?ent rdf:type iotic:Entity" will retrieve all things of that
>>> class regardless of spatial and text query when those optimization
>>> are off.
>>>
>>>      Andy
>>>
>>>>
>>>> Many thanks for this help so close to the holiday season.  Happy
>>>> holidays to you all at Jena - keep up the good work.
>>>>
>>>> Mark
>>>>
>>>>
>>>> Technology Lead, Iotic Labs
>>>> +44 7973 674404
>>>> [email protected]
>>>> https://www.iotic-labs.com
>>>>
>>>> On 22/12/15 11:49, Andy Seaborne wrote:
>>>>> Mark - here is another way.
>>>>>
>>>>> This query:
>>>>>
>>>>> SELECT ?score ?ent
>>>>> WHERE {
>>>>>      { ?ent spatial:nearby ( .... ) }
>>>>>      { ?ent text:query ( ..... ) }
>>>>>      # No ?ent rdf:type iotic:Entity .
>>>>>      # This focuses the query on the presenting issue.
>>>>> }
>>>>>
>>>>> and then run Fuseki with the following flags:
>>>>>
>>>>>     --set arq:optIndexJoinStrategy=false --set arq:optMergeBGPs=false
>>>>>
>>>>> for however you are running the server.
>>>>>
>>>>> You need both --set
>>>>>
>>>>> The service script will not do this very easily - if environment
>>>>> variable FUSEKI_ARGS is set it might do. Untested.
>>>>>
>>>>> It is easier to run the server standalone:
>>>>>
>>>>> (Linux, Mac)
>>>>>
>>>>> The "fuseki-server" script should pass these in:
>>>>>
>>>>> fuseki-server \
>>>>>     --set arq:optIndexJoinStrategy=false --set
>>>>> arq:optMergeBGPs=false \
>>>>>     .. other args ..
>>>>>
>>>>> (Windows or any platform)
>>>>>
>>>>> You can call the server java code directly: all one line:
>>>>>
>>>>>
>>>>> java -Xmx1200M -jar fuseki-server.jar --set
>>>>> arq:optIndexJoinStrategy=false --set arq:optMergeBGPs=false .. other
>>>>> args ..
>>>>>
>>>>> you'll need to put the full path name of fuseki-server.jar
>>>>>
>>>>> Sorry - I don't have your setup to test this fully. I did make sure
>>>>> that
>>>>> the reworked query does lead to an execution plan that is different
>>>>> and
>>>>> should yield some information about the situation.
>>>>>
>>>>>       Andy
>>>>>
>>>>> On 22/12/15 09:50, Andy Seaborne wrote:
>>>>>> On 22/12/15 07:06, Mark Wharton wrote:
>>>>>>> Ah, wheels within wheels.
>>>>>>>
>>>>>>> The formulation with the filter in it is fine, except that if you
>>>>>>> want
>>>>>>> to search for more than one word or you match in label and comment
>>>>>>> then
>>>>>>> the UNION formulation returns you duplicate rows.  This isn't a
>>>>>>> problem
>>>>>>> with the Lucene search which is why (I now remember) I used it in
>>>>>>> the
>>>>>>> first place.
>>>>>>>
>>>>>>> I'm not sure what version of jena I'm using - I just use the fuseki
>>>>>>> release at 2.3.0.  Is there a way to find out?
>>>>>>
>>>>>> 3.0.0
>>>>>>
>>>>>> Many of the java commands support --version and the fuseki- server
>>>>>> jar
>>>>>> is an all-in-one jar:
>>>>>>
>>>>>> java -cp <YourInstall>/fuseki-server.jar arq.sparql --version
>>>>>>
>>>>>>> What's the status on the JENA-999 and JENA-1093 issues?  I see
>>>>>>> there's
>>>>>>> been some activity on 999 in the last few days. Andy Seaborne's last
>>>>>>> comment seems encouraging.
>>>>>>>
>>>>>>> I don't want to adopt a single version as I'll be stuck forever
>>>>>>> patching
>>>>>>> back and forward and it will break eventually.
>>>>>>>
>>>>>>> Many thanks for your continued help.
>>>>>>
>>>>>> JENA-999 may sort of help but I'm not that positive because each ?ent
>>>>>> from the first part will be different going into the second part.  It
>>>>>> looks to me as if it is the overhead of going out to Lucene. (This is
>>>>>> Lucene right? not Solr?)
>>>>>>
>>>>>> The ideal is some super compilation of the text:query and spatial
>>>>>> query
>>>>>> into one big Lucene query.
>>>>>>
>>>>>> What would also be good, which is stop the general optimizer (this is
>>>>>> nothing to do with TDB) using an index join.  Except that is the
>>>>>> better
>>>>>> choice for the rdf:type.  This is what the addition {} were trying
>>>>>> for
>>>>>> except the optimizer outsmarted
>>>>>>
>>>>>> SELECT ?score ?ent
>>>>>> WHERE {
>>>>>>     ?ent spatial:nearby( ...) .
>>>>>>     (?ent ?score) text:query (...) .
>>>>>>     ?ent rdf:type iotic:Entity .
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mark - can you ask the query from Java?  If so,
>>>>>>
>>>>>> Add  "Optimize.noOptimizer(); " before executing the query.  I can't
>>>>>> see
>>>>>> a way to do that from setting the environment for Fuseki.
>>>>>>
>>>>>> Or (the effect on time of this is version specific and whether it
>>>>>> does
>>>>>> anything useful is a big "maybe") you could try this:
>>>>>>
>>>>>> SELECT ?score ?ent
>>>>>> WHERE {
>>>>>>     { OPTIONAL { ?ent spatial:nearby "ABC" . }}
>>>>>>     { OPTIONAL { ?ent  text:query "DEF" } }
>>>>>> }
>>>>>>
>>>>>>        Andy
>>>>>>
>>>>>
>>>>>
>>>
>

Re: Query with spatial *and* text searches.

Reply via email to

Re: Query with spatial and text searches.