Hi,

4store doesn't support VALUES. I executed this query on Virtuoso with
VALUES instead of FILTER. Virtuoso could not execute big size of VALUES
(containing about 4500 element).

I wrote FILTER block as:
FILTER(?x = <uri1> || ?x = <uri2> || ... )

That worked and query executed in 430 milliseconds. The difference is
enormous according to "FILTER IN" version. ("FILTER OR" version :430
milliseconds, "FILTER IN" version 60000 milliseconds)

Thanks for all your help.

Best,
Burak Yönyül

2013/2/28 Andy Seaborne <[email protected]>

> On 28/02/13 17:22, Stephen Allen wrote:
>
>> The results you are seeing indicate that this is probably 4store
>> executing the query slowly, and not anything to do with the Jena
>> client.  You could even take Jena out of the mix and test getting the
>> results directly from the endpoint:
>>
>>     time curl --data-binary "@query1.txt" -H "Content-Type:
>> application/sparql-query" 
>> "http://localhost:3030/ds/**query<http://localhost:3030/ds/query>"
>> >>
>> /dev/null
>>
>> Unfortunately, databases are notorious for handling IN clauses poorly
>> (even many SQL databases).  If 4store supports all of SPARQL 1.1, then
>> you can try changing the IN clause to a VALUES clause [1] and see if
>> that helps.
>>
>> -Stephen
>>
>> [1] 
>> http://www.w3.org/TR/sparql11-**query/#inline-data<http://www.w3.org/TR/sparql11-query/#inline-data>
>>
>
> or even writing
>
> FILTER(?x = <uri1> || ?x = <uri2> || ... )
>
> which is logically the same but might (just might) trigger the optimizer
> to so something.
>
> But I'm guessing that Stephen's suggestion shows it's how 4Store executes.
>
>
>
>>
>>
>> On Thu, Feb 28, 2013 at 10:30 AM, Burak Yönyül <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> When I reduce FILTER block, the execution time of the query longs
>>> shorter,
>>> but I receive less result than original query. So result set  is reducing
>>> too.
>>>
>>
> Sounds like it's probing to see if the variable has one of the values.
>
>
>
>>> I recorded each elapsed time round the while loop, and there is
>>> variability
>>> at some looping times. The code that records times:
>>>
>>>                  int i = 0;
>>> long before = System.currentTimeMillis();
>>> while (resultSet.hasNext()) {
>>> i++;
>>> resultSet.next();
>>> long after = System.currentTimeMillis();
>>> fileWriter.append("Time of " + i + ". result: " + (after - before)+" ms"
>>> + "\n");
>>> before = System.currentTimeMillis();
>>> }
>>>
>>> The example output:
>>>
>>> Time of 1. result: 4 ms
>>> Time of 2. result: 0 ms
>>> Time of 3. result: 1 ms
>>> ...
>>> Time of 20. result: 14 ms
>>> Time of 21. result: 0 ms
>>> Time of 22. result: 1 ms
>>> Time of 23. result: 1 ms
>>> ...
>>> Time of 27. result: 17 ms
>>> Time of 28. result: 1 ms
>>> ...
>>> Time of 34. result: 10 ms
>>> ... and so on.
>>>
>>
> So the server is sending rows back burstily - that is not Java CG at 10-20
> rows or the cost of sending the query.  It's 4Store.
>
>
>
>>> But when I execute LIMIT query, these times are all 0 or 1.
>>>
>>> I don't know that, why in FILTER query, there is time differences at
>>> getting some results. Do you have any idea about that?
>>>
>>
> It really does look like the cost of the FILTER having to get the lexical
> form of the URI to do the comparison on a high number of items.  Maybe also
> probing to see if it is a value, not getting all the choices once per query
> and testing.
>
> (ARQ+TDB can go mad on these as well - it's a tricky thing to optimize in
> all situations.)
>
>         Andy
>
>
>>> Best,
>>> Burak Yönyül
>>>
>>
>

Reply via email to