Re: Different parse times for similar result sets

Burak Yönyül Thu, 28 Feb 2013 04:44:36 -0800

Hi Andy,

Count query is executed in 20 miliseconds and it returns 71019 as count.


As I understand you, the query result is calculated when the http request
is sent in hasNext(). But, the results are not sent back to me immediatly.
The results are sent to me as long as I read them with hasNext(). Am I
right?

Burak

2013/2/28 Andy Seaborne <[email protected]>

> On 28/02/13 10:29, Burak Yönyül wrote:
>
>> Hi,
>>
>> Thanks for your interest.
>>
>> I know that hasNext() executes the query. When I debug, I see that only
>> one http request is sent to the sparql endpoint. Then, I think the first
>> result should take long time if there is only one http request. However,
>> it takes long time to get all results, but not the first one.
>>
>
> There is only one http request - the exec sets it up, and each hasNext
> pulls one request from the (buffered) incoming stream.
>
> The server is sending the results slowly, which (from the description of
> the query), is not unlikely.
>
> (digging around the moderates mailbox)
>
> The second query is 192K bytes long, almost all a FILTER/IN.  I don't know
> how much
>
> It touches a much larger part of the database than the LIMIT query.
>
> What does this take?
>
> SELECT (count(*) AS ?c)  WHERE {
> ?y 
> <http://www.w3.org/2002/07/**owl#sameAs<http://www.w3.org/2002/07/owl#sameAs>>
> ?Drug.
> ?Int <http://www4.wiwiss.fu-berlin.**de/drugbank/resource/drugbank/**
> interactionDrug1<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug1>>
> ?y.
> ?Int <http://www4.wiwiss.fu-berlin.**de/drugbank/resource/drugbank/**
> interactionDrug2<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug2>>
> ?IntDrug.
> ?Int 
> <http://www4.wiwiss.fu-berlin.**de/drugbank/resource/drugbank/**text<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/text>>
> ?IntEffect.
> }
>
>
>
>         Andy
>
> SELECT  *
> WHERE
>   { ?y 
> <http://www.w3.org/2002/07/**owl#sameAs<http://www.w3.org/2002/07/owl#sameAs>>
> ?Drug .
>     ?Int <http://www4.wiwiss.fu-berlin.**de/drugbank/resource/drugbank/**
> interactionDrug1<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug1>>
> ?y .
>     ?Int <http://www4.wiwiss.fu-berlin.**de/drugbank/resource/drugbank/**
> interactionDrug2<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug2>>
> ?IntDrug .
>     ?Int <http://www4.wiwiss.fu-berlin.**de/drugbank/resource/drugbank/**
> text <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/text>>
> ?IntEffect
>     FILTER ( ?Drug IN 
> (<http://dbpedia.org/resource/**Tetrahydrocannabinol<http://dbpedia.org/resource/Tetrahydrocannabinol>>,
> <http://dbpedia.org/resource/**Mescaline<http://dbpedia.org/resource/Mescaline>>,
> <http://dbpedia.org/resource/**Bacillus_Calmette-Gu%C3%A9rin<http://dbpedia.org/resource/Bacillus_Calmette-Gu%C3%A9rin>
> >**, 
> ><http://dbpedia.org/resource/**Sugammadex<http://dbpedia.org/resource/Sugammadex>>,
> <http://dbpedia.org/resource/**Carnitine<http://dbpedia.org/resource/Carnitine>>,
> <http://dbpedia.org/resource/**Mephedrone<http://dbpedia.org/resource/Mephedrone>
> >,
>
> ... for about 190Kbytes)
>
> the LIMIT query is
>
> SELECT * WHERE {
> ?y 
> <http://www.w3.org/2002/07/**owl#sameAs<http://www.w3.org/2002/07/owl#sameAs>>
> ?Drug.
> ?Int <http://www4.wiwiss.fu-berlin.**de/drugbank/resource/drugbank/**
> interactionDrug1<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug1>>
> ?y.
> ?Int <http://www4.wiwiss.fu-berlin.**de/drugbank/resource/drugbank/**
> interactionDrug2<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug2>>
> ?IntDrug.
> ?Int 
> <http://www4.wiwiss.fu-berlin.**de/drugbank/resource/drugbank/**text<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/text>>
> ?IntEffect.
> } LIMIT 9054
>
>
>> Best,
>> Burak
>>
>> 2013/2/28 Andy Seaborne <[email protected] <mailto:[email protected]>>
>>
>>
>>     On 28/02/13 09:13, Burak Yönyül wrote:
>>
>>         Hi,
>>
>>         I am having a problem about ARQ's parse time of a big result
>>         set. (ARQ
>>         version 2.9.4)
>>
>>         I execute two queries which produce similar result sets. I
>>         attached them.
>>
>>
>>     Attachements lost.  This list does not support attachments.
>>
>>     You weren't subscribed to the list - I have bcc'ed you on this reply
>>     but you need to subscribe to send and receive messages.
>>
>>     Answer below:
>>
>>
>>         The first one contains a filter block ("one_of_filter" type) and
>>         returns
>>         around 9000 results. On the other hand, the second one doesn't
>>         have a
>>         filter block and has a limit element to produce same number of
>>         results
>>         with the former query. (Not the same results, but only the same
>>         result
>>         count)
>>
>>         The problem is that it takes 60 seconds to parse the result of
>>         the first
>>         query while it only takes 0.4 seconds to parse the result of the
>>         second
>>         query. I'm executing these queries on a sparql endpoint of
>> drugbank
>>         dataset which is loaded on a local 4Store server. ( Here's the
>>         link to
>>         download considered drugbank dataset:
>>         https://docs.google.com/file/_**_d/0B3zcaLjQc0___**
>> yaEdsTUdxQ1ZJc28/edit?usp=__**sharing<https://docs.google.com/file/__d/0B3zcaLjQc0___yaEdsTUdxQ1ZJc28/edit?usp=__sharing>
>>
>>         <https://docs.google.com/file/**d/0B3zcaLjQc0_**
>> yaEdsTUdxQ1ZJc28/edit?usp=**sharing<https://docs.google.com/file/d/0B3zcaLjQc0_yaEdsTUdxQ1ZJc28/edit?usp=sharing>
>> >
>>         )
>>
>>         I execute the query with the code given below:
>>
>>         QueryExecution queryExecution =
>>         QueryExecutionFactory.__**sparqlService("http://__**
>> localhost:8000/sparql/
>>
>>         <http://localhost:8000/sparql/**>", query);
>>         ResultSet resultSet = queryExecution.execSelect();
>>         while (resultSet.hasNext()) { resultSet.next(); ... }
>>
>>         Reading the first result of two queries take similar times. But
>>         reading
>>         the all result set of the first query takes a long time. Do you
>>         have any
>>         idea why parsing two queries are different?
>>
>>
>>     They don't take the same time - queryExecution.execSelect does not
>>     complete the query - it sets it up.
>>
>>     ARQ is streaming and does nto take a copy of the results on
>> execSelect.
>>     It reads the results are pulled in hasNext.  The server is taking
>>     60s for the more expensive query presumably because it needs to scan
>>     a lot more results to filter out the down to 9000.  There are
>>     optimizations that you can do on ?x IN (...) (if that is what a
>>     one_of_filter is) but I don't know if 4Store does them, or can given
>>     it's architecture.
>>
>>     Try this:
>>
>>     ResultSet resultSet = queryExecution.execSelect();
>>     resultSet = ResultSetFactory.copyResult(__**resultSet) ;
>>
>>
>>     to consume the data immediately before starting the loop.
>>
>>              Andy
>>
>>
>>
>>
>>         Best Regards,
>>         Burak Yönyül
>>
>>
>>
>>
>

Re: Different parse times for similar result sets

Reply via email to