On 11/02/14 11:05, Bill Roberts wrote:
Hi All
Does anyone have plans to implement GeoSPARQL in Jena? I'm aware of Jena
Spatial which obviously has many functional similarities, but just wondering if
there are plans for GeoSPARQL itself?
Thanks
Bill
Hi Bill -
Hope you weren't one of the reviewers for the submission to the W3C/OGC
workshop that coming up! The "why not GeoSPARQL" came up but as the
submission tries to point out, the work needed to do even a partial
GeoSPARQL is not insignificant.
There are non-technical issues as well. Support and users questions -
suppose a complete, perfect implementation is released in Jena. Or
suppose it's a partial implementation - now there is a need to explain
what is and isn't implemented.
The first step it needs someone to investigate it properly; it does look
to me like something that needs resource with access to a geospatial
expert for at least advice. It's not in the same league as a one-off
patch to ARQ.
jena-spatial was driven by the availability of the geo functions in
lucene. jena-spatial is a self-contained extension, GeoSPARQL needs
deep integration into the query engine just to do the same
point-in-bounding box functionality.
From what I can see, there needs to be a community around geospatial
data somehow, not just users learning about geospatial data. That would
be good to have wherever it is; Jena community, sub project, independent
project on github.
GeoSPARQL is a core and number of extensions. The core is just some
class definitions - Jena already supports all the core requirements as
does all general SPARQL engines but it does not do anything. It's the
various extensions that give the functionality.
GeoSPARQL covers regions and boundaries - for the Topology Vocabulary
Extension (section 7) it needs one or more geo-reasoners to provide the
topological relations e.g. geo:sfDisjoint in relation_family=Simple
Features; There is also relation_family=Egenhofer and relation_family=RCC8.
Geometry Extension (section 8) have the interesting part
"Non-topological Query Functions" (section 8.7)
Take function "geof:distance"
FILTER ( geof:distance(?geoPoints, SomeFixedGeo, units) < 56 )
which is the within-circle function.
If you simply add that function as a custom function to a general
purpose SPARQL engine, then to calculate it you need to full scan of the
geo data to find all the ?geoPoints, and filter them. That's the
situation we had pre-jena-spatial. It's slow even on modest data
without access to a geospatial index (R-tree, quad-tree, lucene spatial,
whatever),
jena-spatial collects the bounded geospatial access together in one
property function that asks a geo index that can find a few points of
interest very quickly then adds info from the rest of the RDF data.
To do the GeoSPARQL style, you need to pick out from the graph pattern
part where ?geo came from, being careful that the non-geo access
patterns are not made inefficient in the process. It's an optimization
problem. If the focus is on a geospatial DB, then it's not too bad but
if the RDG database is some geo and a lot of other data, all the
optimization choices get mixed up and compete.
There are various other geof:* functions which work on regions and run
into later sections getting more complicated.
The Query Rewrite Extension (section 11) looks fun. It's query rewrite
to turn property relationships into primitive data access and custom
functions. ARQ can do that but again, what about when in the context of
general data as well?
I haven't found geo libraries to use except spatial4j. There are some
that are various ones using GPL which I haven't tried, and obvious they
have consequences for the whole of Jena. There would need to be some
kind of geo index, working with the optimizer and data loading. The
Lucene spatial index is just point data. An R-tree and regions is needed
for more general GeoSPARQL extensions.
So - call to geo-experts - is that a fair assessment? Being wrong about
the amount of work needed would be very good news.
Andy