Re: Query with spatial and text searches.

Osma Suominen Sun, 20 Dec 2015 23:54:07 -0800

Hi Mark!

I'm not sure that the jena-external-index approach would help. It mightor might not, depending on how it's implemented. AFAIK it's just an idearight now, I haven't seen any code.

In any case I think the problem with jena-text and probably jena-spatialtoo (not very familiar with it) is that they are pretty fast when youcan get by doing just a single query with an unbound subject. But whenyou instead have a fixed subject, or more likely a list of possiblesubjects, the performance will be very bad because every subject will bequeried separately from the Lucene index. This is probably what happenswith your query - the results of one query will be fed to the other.JENA-999 tried to address some of this by introducing a cache intojena-text, but the patch that was committed had other problems (itreturned the wrong results in some cases) so it was reverted recentlyand an improved version hasn't yet been developed.


In any case, you could try these variants of the original query:

(I'm really on thin ground here, but I spent a couple of days last weektrying to optimize a somewhat similar query involving jena-text andvarious other conditions, trying to find a solution with good performance)


1. Try to isolate the text and spatial queries with extra braces

SELECT ?score ?ent
WHERE {
 { ?ent spatial:nearby(51.507999420166016 -0.10999999940395355
                    70.01807880401611 'km') }
 { (?ent ?score) text:query ('environment' 'lang:en') }
 ?ent rdf:type iotic:Entity .
}

2. Use a FILTER expression to delay matching the results

SELECT ?score ?ent
WHERE {
 ?ent spatial:nearby(51.507999420166016 -0.10999999940395355
                    70.01807880401611 'km') .
 (?ent2 ?score) text:query ('environment' 'lang:en') .
 FILTER (?ent = ?ent2)
 ?ent rdf:type iotic:Entity .
}

3. Combination of above ideas

SELECT ?score ?ent
WHERE {
 { ?ent spatial:nearby(51.507999420166016 -0.10999999940395355
                    70.01807880401611 'km') }
 { (?ent2 ?score) text:query ('environment' 'lang:en') }
 FILTER (?ent = ?ent2)
 ?ent rdf:type iotic:Entity .
}


-Osma


On 21/12/15 00:28, Mark Wharton wrote:

Hi Marco.

Yes, that's it. The indexes work well in isolation, but don't combine well. 
Smooshing them into a single index would be a great idea, especially if the 
query could resolve both text and spatial predicates with one matching scan of 
the index.

Perhaps Stephen could be persuaded to pick up the pace on this one?

Thanks Mark

On 20 December 2015 12:53:39 GMT+00:00, Marco Neumann <[email protected]> 
wrote:

yes correct Mark I am only referring to the extra payload here for
invoking the spatial filter in the SPARQL query.

now that you mention a particular issue with the combined use of both
jena-text and jena-spatial (something I am not aware of ) this might
be related to duplicated code in the two projects. back in May Stephen
Allen wrote on the dev-list that he is about to address some of this
possibly in a new jena-external-index project.

http://mail-archives.apache.org/mod_mbox/jena-dev/201505.mbox/%3ccaptxtvpwu2ijogyj0kx8o6-07yokk5g1t32b_k3g_cjaqvk...@mail.gmail.com%3E

On Sun, Dec 20, 2015 at 2:29 AM, Mark Wharton
<[email protected]> wrote:

Hi

Thanks for this.  I've read the chapter in the book and now I'm not

sure

if I misunderstand your reply or you've only addressed half of the

problem.


I'm not worried about the performance of the spatial search in

isolation

- that's 97ms which is fine.  The text search on its own takes a bit
longer but that's acceptable, too.

It's when I put the spatial and text *together* that query time

increase

by 10-30 times.  That's the bit I don't understand and would like

some

help with.

Is there a SPARQL query formulation that can "AND the indexes" rather
than retrieving one set and looping through to retrieve the matches
individually on the other.  (Which is my guess as to how it works).

Thanks for your help so far.

Mark

Technology Lead, Iotic Labs
[email protected]
https://www.iotic-labs.com

On 18/12/15 18:59, Marco Neumann wrote:

it's a common spatial access method latency in paticular for small
data sets. you can try a mbr range query instead.

see Chapter 13 Managing Space and Time in Semantic Web Programming

by

John Hebeler et. al.. 2009

On Fri, Dec 18, 2015 at 10:13 AM, Mark Wharton
<[email protected]> wrote:

Hi Jena users.

I'm having performance problems with a query that uses text and

location

search

The query is roughly this:


SELECT ?score ?ent
WHERE {
  ?ent spatial:nearby(51.507999420166016 -0.10999999940395355
                     70.01807880401611 'km') .
(?ent ?score) text:query ('environment' 'lang:en') .
  ?ent rdf:type iotic:Entity .
}


There are about 450 entities in that radius
There are about 2200 entities with environment in their

rdfs:comment


The query takes 5 seconds.

I've tried this:
Commenting out the text predicate the query takes 97 ms
Commenting out the spatial predicate the query takes 438 ms
Swapping the spatial and text predicates it takes 15 seconds


My question is this...  It looks like the query is separately

getting

the results of the first two predicates and merging (somehow) to

find

the intersection.  Is there a formulation which will intersect the

two

sets faster?

Many TIAs,

Mark
--
Technology Lead, Iotic Labs
[email protected]
https://www.iotic-labs.com




--


---
Marco Neumann
KONA



--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Re: Query with spatial *and* text searches.

Reply via email to

Re: Query with spatial and text searches.