Re: Inefficient Query

Nikolaos Karalis Thu, 11 May 2017 03:07:31 -0700

Using Dataset instead of Model worked.

I also wanted to ask you, if binary operators that could be used in the
filter clause of a query such as equal(=), not equal(!=), etc. could be
pushed to the underlying database (instead of
fetching the data from the data store and then evaluating the filter
condition)?


Nikolaos Karalis

>
>
> On 10/05/17 17:01, Nikolaos Karalis wrote:
>> I use the following methods to create the connection and the model:
>>
>> myStore =SDBFactory.connectStore(assemblerFile); model
>> =SDBFactory.connectDefaultModel(myStore);
>
> Try
>
> Dataset dataset = SDBFactory.connectDataset(myStore);
>
> and query the dataset, not a model:
>
> QueryExecutionFactory.create(..., dataset);
>
>
> as I guess you are passing the model to QueryExecutionFactory which
> internally wraps it in a DatasetGraph that SDB will not recognize.
>
>      Andy
>
>> myStore.getTableFormatter().create(); model.read(data1);
>> model.read(data2);
>>
>>   Nikolaos Karalis
>>
>> On 10/5/2017 18:29, Andy Seaborne wrote:
>>> For some reason, onyl
>>>
>>>
>>> I'd expect the SQL query to be (slightly reformatted output of
>>> sdbprint)
>>>
>>> SELECTV_4=?o2
>>>   T_1.s AS V_1, T_1.o AS V_2,
>>>   T_2.s AS V_3, T_2.o AS V_4
>>> FROM
>>>      -- ?s1 <http://linkedgeodata.org/ontology/asWKT> ?o1
>>>     Triples AS T_1
>>>   INNER JOIN
>>>     -- ?s2 <http://geo.linkedopendata.gr/gag/ontology/asWKT> ?o2
>>>     Triples AS T_2
>>>   ON ( T_1.p = '<http://linkedgeodata.org/ontology/asWKT>'
>>>     AND T_2.p = '<http://geo.linkedopendata.gr/gag/ontology/asWKT>'
>>>    )
>>>
>>> Note the INNER JOIN.
>>>
>>> but that can't be used if the query is coming through graph.find when
>>> only a single triple pattern at a time is presented to the SQL
>>> transaction.
>>>
>>> This is what it would look like
>>>
>>> SELECT
>>>   T_1.s AS V_2
>>> FROM Triples AS T_1<http://linkedgeodata.org/ontology/asWKT> ?o1
>>> WHERE ( T_1.p = '<http://linkedgeodata.org/ontology/asWKT>'
>>>    )
>>>
>>> which is what the log shows.
>>>
>>> So is a graph put into a general dataset or is this via
>>> DatasetGraphSDB (which triggers the more general processing)?
>>>
>>>     Andy
>>>
>>> On 10/05/17 13:45, Nikolaos Karalis wrote:
>>>> Dear Andy,
>>>> thank you for replying to my email. I forked the jena repository and
>>>> added
>>>> my changes (https://github.com/nkaralis/jena).
>>>> I created three files in the directory layout1:
>>>>     FormatterSimpleHive.java, that has the necessary functions in
>>>> order to
>>>> create the tables triples and prefixes
>>>>     StoreSimpleHive.java, that creates a layout1/hive store
>>>>     TupleLoaderSimpleHive.java, that overrides the function load() in
>>>> order
>>>> to load multiple rows at once. This is a temporary solution.
>>>>
>>>> I also made some changes to the following files:
>>>>     /store/StoreFactory.java
>>>>     /store/DatabaseType.java
>>>>     /util/StoreUtils.java
>>>>     /sql/JDBC.java
>>>>     /compiler/SDBCompile.java
>>>> in order to support the hive database.
>>>>
>>>> This is the link to the project with the user-defined spatial
>>>> operations:
>>>> https://github.com/nkaralis/jenaspatial
>>>> I also wanted to ask you, if binary operators that could be used in
>>>> the
>>>> filter clause of a query such as equal(=), not equal(!=), etc. could
>>>> be
>>>> pushed to the underlying database (instead of
>>>> fetching the data from the data store and then evaluating the filter
>>>> condition)
>>>>
>>>> Best regards,
>>>> Nikolaos Karalis
>>>>
>>>>> Hi Nikolaos,
>>>>>
>>>>> The query pattern generator isn't very sophisticated and more skewed
>>>>> to
>>>>> use execution where the data in "close" (i.e. there is a cache or
>>>>> local
>>>>> database).
>>>>>
>>>>> Normally, SDB would send a single SQL query for the two triple
>>>>> patterns
>>>>> and have the SQL database engine worry about how best to do this.
>>>>>
>>>>> But in the log it seems that this isn't happening:
>>>>>
>>>>> either the query is going through some additional layers that means
>>>>> the
>>>>> SDb execution engine isn't getting the whole pattern, or how the Hive
>>>>> adapter works is onl yon a per Graph.find basis.
>>>>>
>>>>> So you have a link to you extended jena-sdb?>
>>>>>
>>>>>      Andy
>>>>>
>>>>>
>>>>> On 09/05/17 11:20, Nikolaos Karalis wrote:
>>>>>> Dear Jena developers,
>>>>>>
>>>>>> I have extended jena-sdb in order to support Hive Database and also
>>>>>> started implementing some user-defined GeoSPARQL functions using
>>>>>> jena-arq.
>>>>>> I ran the following query:
>>>>>>
>>>>>>     PREFIX geof: <http://example.org/function#>
>>>>>>     SELECT ?s1 ?s2
>>>>>>     WHERE {
>>>>>>         ?s1 <http://linkedgeodata.org/ontology/asWKT> ?o1 .
>>>>>>         ?s2 <http://geo.linkedopendata.gr/gag/ontology/asWKT> ?o2 .
>>>>>>         FILTER(geof:sfWithin(?o1, ?o2)) .
>>>>>>     }
>>>>>>
>>>>>> and observed that for each iteration of the resultsSet, for each
>>>>>> result
>>>>>> for ?s1, ?s2 is computed from scratch. I've attached the logs of the
>>>>>> hiveserver2 as well.
>>>>>> Is there a way to make this query more efficient?
>>>>>>
>>>>>> Best regards,
>>>>>> Nikolaos Karalis
>>>>>>
>>>>>
>>>>
>>>>
>>
>>
>

Re: Inefficient Query

Reply via email to