On 11/02/14 11:05, Bill Roberts wrote:
Hi All

Does anyone have plans to implement GeoSPARQL in Jena?  I'm aware of Jena 
Spatial which obviously has many functional similarities, but just wondering if 
there are plans for GeoSPARQL itself?

Thanks

Bill


Hi Bill -

Hope you weren't one of the reviewers for the submission to the W3C/OGC workshop that coming up! The "why not GeoSPARQL" came up but as the submission tries to point out, the work needed to do even a partial GeoSPARQL is not insignificant.

There are non-technical issues as well. Support and users questions - suppose a complete, perfect implementation is released in Jena. Or suppose it's a partial implementation - now there is a need to explain what is and isn't implemented.

The first step it needs someone to investigate it properly; it does look to me like something that needs resource with access to a geospatial expert for at least advice. It's not in the same league as a one-off patch to ARQ.

jena-spatial was driven by the availability of the geo functions in lucene. jena-spatial is a self-contained extension, GeoSPARQL needs deep integration into the query engine just to do the same point-in-bounding box functionality.

From what I can see, there needs to be a community around geospatial data somehow, not just users learning about geospatial data. That would be good to have wherever it is; Jena community, sub project, independent project on github.

GeoSPARQL is a core and number of extensions. The core is just some class definitions - Jena already supports all the core requirements as does all general SPARQL engines but it does not do anything. It's the various extensions that give the functionality.

GeoSPARQL covers regions and boundaries - for the Topology Vocabulary Extension (section 7) it needs one or more geo-reasoners to provide the topological relations e.g. geo:sfDisjoint in relation_family=Simple Features; There is also relation_family=Egenhofer and relation_family=RCC8.

Geometry Extension (section 8) have the interesting part "Non-topological Query Functions" (section 8.7)

Take function "geof:distance"

    FILTER ( geof:distance(?geoPoints, SomeFixedGeo, units) < 56 )

which is the within-circle function.

If you simply add that function as a custom function to a general purpose SPARQL engine, then to calculate it you need to full scan of the geo data to find all the ?geoPoints, and filter them. That's the situation we had pre-jena-spatial. It's slow even on modest data without access to a geospatial index (R-tree, quad-tree, lucene spatial, whatever),

jena-spatial collects the bounded geospatial access together in one property function that asks a geo index that can find a few points of interest very quickly then adds info from the rest of the RDF data.

To do the GeoSPARQL style, you need to pick out from the graph pattern part where ?geo came from, being careful that the non-geo access patterns are not made inefficient in the process. It's an optimization problem. If the focus is on a geospatial DB, then it's not too bad but if the RDG database is some geo and a lot of other data, all the optimization choices get mixed up and compete.

There are various other geof:* functions which work on regions and run into later sections getting more complicated.

The Query Rewrite Extension (section 11) looks fun. It's query rewrite to turn property relationships into primitive data access and custom functions. ARQ can do that but again, what about when in the context of general data as well?

I haven't found geo libraries to use except spatial4j. There are some that are various ones using GPL which I haven't tried, and obvious they have consequences for the whole of Jena. There would need to be some kind of geo index, working with the optimizer and data loading. The Lucene spatial index is just point data. An R-tree and regions is needed for more general GeoSPARQL extensions.

So - call to geo-experts - is that a fair assessment? Being wrong about the amount of work needed would be very good news.

        Andy

Reply via email to