Re: Inefficient Query

Andy Seaborne Wed, 10 May 2017 08:29:47 -0700

For some reason, onyl


I'd expect the SQL query to be (slightly reformatted output of sdbprint)

SELECTV_4=?o2
  T_1.s AS V_1, T_1.o AS V_2,
  T_2.s AS V_3, T_2.o AS V_4
FROM
     -- ?s1 <http://linkedgeodata.org/ontology/asWKT> ?o1
    Triples AS T_1
  INNER JOIN
    -- ?s2 <http://geo.linkedopendata.gr/gag/ontology/asWKT> ?o2
    Triples AS T_2
  ON ( T_1.p = '<http://linkedgeodata.org/ontology/asWKT>'
    AND T_2.p = '<http://geo.linkedopendata.gr/gag/ontology/asWKT>'
   )

Note the INNER JOIN.

but that can't be used if the query is coming through graph.find whenonly a single triple pattern at a time is presented to the SQL transaction.


This is what it would look like

SELECT
  T_1.s AS V_2
FROM Triples AS T_1<http://linkedgeodata.org/ontology/asWKT> ?o1
WHERE ( T_1.p = '<http://linkedgeodata.org/ontology/asWKT>'
   )

which is what the log shows.

So is a graph put into a general dataset or is this via DatasetGraphSDB(which triggers the more general processing)?


    Andy

On 10/05/17 13:45, Nikolaos Karalis wrote:

Dear Andy,
thank you for replying to my email. I forked the jena repository and added
my changes (https://github.com/nkaralis/jena).
I created three files in the directory layout1:
        FormatterSimpleHive.java, that has the necessary functions in order to
create the tables triples and prefixes
        StoreSimpleHive.java, that creates a layout1/hive store
        TupleLoaderSimpleHive.java, that overrides the function load() in order
to load multiple rows at once. This is a temporary solution.

I also made some changes to the following files:
        /store/StoreFactory.java
        /store/DatabaseType.java
        /util/StoreUtils.java
        /sql/JDBC.java
        /compiler/SDBCompile.java
in order to support the hive database.

This is the link to the project with the user-defined spatial operations:
https://github.com/nkaralis/jenaspatial
I also wanted to ask you, if binary operators that could be used in the
filter clause of a query such as equal(=), not equal(!=), etc. could be
pushed to the underlying database (instead of
fetching the data from the data store and then evaluating the filter
condition)

Best regards,
Nikolaos Karalis

Hi Nikolaos,

The query pattern generator isn't very sophisticated and more skewed to
use execution where the data in "close" (i.e. there is a cache or local
database).

Normally, SDB would send a single SQL query for the two triple patterns
and have the SQL database engine worry about how best to do this.

But in the log it seems that this isn't happening:

either the query is going through some additional layers that means the
SDb execution engine isn't getting the whole pattern, or how the Hive
adapter works is onl yon a per Graph.find basis.

So you have a link to you extended jena-sdb?>

     Andy


On 09/05/17 11:20, Nikolaos Karalis wrote:

Dear Jena developers,

I have extended jena-sdb in order to support Hive Database and also
started implementing some user-defined GeoSPARQL functions using
jena-arq.
I ran the following query:

        PREFIX geof: <http://example.org/function#>
        SELECT ?s1 ?s2
        WHERE {
                ?s1 <http://linkedgeodata.org/ontology/asWKT> ?o1 .
            ?s2 <http://geo.linkedopendata.gr/gag/ontology/asWKT> ?o2 .
            FILTER(geof:sfWithin(?o1, ?o2)) .
        }

and observed that for each iteration of the resultsSet, for each result
for ?s1, ?s2 is computed from scratch. I've attached the logs of the
hiveserver2 as well.
Is there a way to make this query more efficient?

Best regards,
Nikolaos Karalis

Re: Inefficient Query

Reply via email to