On 28/02/13 10:29, Burak Yönyül wrote:
Hi,
Thanks for your interest.
I know that hasNext() executes the query. When I debug, I see that only
one http request is sent to the sparql endpoint. Then, I think the first
result should take long time if there is only one http request. However,
it takes long time to get all results, but not the first one.
There is only one http request - the exec sets it up, and each hasNext
pulls one request from the (buffered) incoming stream.
The server is sending the results slowly, which (from the description of
the query), is not unlikely.
(digging around the moderates mailbox)
The second query is 192K bytes long, almost all a FILTER/IN. I don't
know how much
It touches a much larger part of the database than the LIMIT query.
What does this take?
SELECT (count(*) AS ?c) WHERE {
?y <http://www.w3.org/2002/07/owl#sameAs> ?Drug.
?Int
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug1>
?y.
?Int
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug2>
?IntDrug.
?Int <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/text>
?IntEffect.
}
Andy
SELECT *
WHERE
{ ?y <http://www.w3.org/2002/07/owl#sameAs> ?Drug .
?Int
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug1>
?y .
?Int
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug2>
?IntDrug .
?Int
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/text> ?IntEffect
FILTER ( ?Drug IN
(<http://dbpedia.org/resource/Tetrahydrocannabinol>,
<http://dbpedia.org/resource/Mescaline>,
<http://dbpedia.org/resource/Bacillus_Calmette-Gu%C3%A9rin>,
<http://dbpedia.org/resource/Sugammadex>,
<http://dbpedia.org/resource/Carnitine>,
<http://dbpedia.org/resource/Mephedrone>,
... for about 190Kbytes)
the LIMIT query is
SELECT * WHERE {
?y <http://www.w3.org/2002/07/owl#sameAs> ?Drug.
?Int
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug1>
?y.
?Int
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug2>
?IntDrug.
?Int <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/text>
?IntEffect.
} LIMIT 9054
Best,
Burak
2013/2/28 Andy Seaborne <a...@apache.org <mailto:a...@apache.org>>
On 28/02/13 09:13, Burak Yönyül wrote:
Hi,
I am having a problem about ARQ's parse time of a big result
set. (ARQ
version 2.9.4)
I execute two queries which produce similar result sets. I
attached them.
Attachements lost. This list does not support attachments.
You weren't subscribed to the list - I have bcc'ed you on this reply
but you need to subscribe to send and receive messages.
Answer below:
The first one contains a filter block ("one_of_filter" type) and
returns
around 9000 results. On the other hand, the second one doesn't
have a
filter block and has a limit element to produce same number of
results
with the former query. (Not the same results, but only the same
result
count)
The problem is that it takes 60 seconds to parse the result of
the first
query while it only takes 0.4 seconds to parse the result of the
second
query. I'm executing these queries on a sparql endpoint of drugbank
dataset which is loaded on a local 4Store server. ( Here's the
link to
download considered drugbank dataset:
https://docs.google.com/file/__d/0B3zcaLjQc0___yaEdsTUdxQ1ZJc28/edit?usp=__sharing
<https://docs.google.com/file/d/0B3zcaLjQc0_yaEdsTUdxQ1ZJc28/edit?usp=sharing>
)
I execute the query with the code given below:
QueryExecution queryExecution =
QueryExecutionFactory.__sparqlService("http://__localhost:8000/sparql/
<http://localhost:8000/sparql/>", query);
ResultSet resultSet = queryExecution.execSelect();
while (resultSet.hasNext()) { resultSet.next(); ... }
Reading the first result of two queries take similar times. But
reading
the all result set of the first query takes a long time. Do you
have any
idea why parsing two queries are different?
They don't take the same time - queryExecution.execSelect does not
complete the query - it sets it up.
ARQ is streaming and does nto take a copy of the results on execSelect.
It reads the results are pulled in hasNext. The server is taking
60s for the more expensive query presumably because it needs to scan
a lot more results to filter out the down to 9000. There are
optimizations that you can do on ?x IN (...) (if that is what a
one_of_filter is) but I don't know if 4Store does them, or can given
it's architecture.
Try this:
ResultSet resultSet = queryExecution.execSelect();
resultSet = ResultSetFactory.copyResult(__resultSet) ;
to consume the data immediately before starting the loop.
Andy
Best Regards,
Burak Yönyül