Re: Different parse times for similar result sets

Andy Seaborne Thu, 28 Feb 2013 03:03:14 -0800

On 28/02/13 10:29, Burak Yönyül wrote:

Hi,


Thanks for your interest.

I know that hasNext() executes the query. When I debug, I see that only
one http request is sent to the sparql endpoint. Then, I think the first
result should take long time if there is only one http request. However,
it takes long time to get all results, but not the first one.

There is only one http request - the exec sets it up, and each hasNextpulls one request from the (buffered) incoming stream.

The server is sending the results slowly, which (from the description ofthe query), is not unlikely.


(digging around the moderates mailbox)

The second query is 192K bytes long, almost all a FILTER/IN. I don'tknow how much


It touches a much larger part of the database than the LIMIT query.

What does this take?

SELECT (count(*) AS ?c)  WHERE {
?y <http://www.w3.org/2002/07/owl#sameAs> ?Drug.

}



        Andy

SELECT  *
WHERE
  { ?y <http://www.w3.org/2002/07/owl#sameAs> ?Drug .

?Int<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug1>?y .?Int<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/interactionDrug2>?IntDrug .?Int<http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/text> ?IntEffectFILTER ( ?Drug IN(<http://dbpedia.org/resource/Tetrahydrocannabinol>,<http://dbpedia.org/resource/Mescaline>,<http://dbpedia.org/resource/Bacillus_Calmette-Gu%C3%A9rin>,<http://dbpedia.org/resource/Sugammadex>,<http://dbpedia.org/resource/Carnitine>,<http://dbpedia.org/resource/Mephedrone>,


... for about 190Kbytes)

the LIMIT query is

SELECT * WHERE {
?y <http://www.w3.org/2002/07/owl#sameAs> ?Drug.

} LIMIT 9054


Best,
Burak

2013/2/28 Andy Seaborne <a...@apache.org <mailto:a...@apache.org>>

    On 28/02/13 09:13, Burak Yönyül wrote:

        Hi,

        I am having a problem about ARQ's parse time of a big result
        set. (ARQ
        version 2.9.4)

        I execute two queries which produce similar result sets. I
        attached them.


    Attachements lost.  This list does not support attachments.

    You weren't subscribed to the list - I have bcc'ed you on this reply
    but you need to subscribe to send and receive messages.

    Answer below:


        The first one contains a filter block ("one_of_filter" type) and
        returns
        around 9000 results. On the other hand, the second one doesn't
        have a
        filter block and has a limit element to produce same number of
        results
        with the former query. (Not the same results, but only the same
        result
        count)

        The problem is that it takes 60 seconds to parse the result of
        the first
        query while it only takes 0.4 seconds to parse the result of the
        second
        query. I'm executing these queries on a sparql endpoint of drugbank
        dataset which is loaded on a local 4Store server. ( Here's the
        link to
        download considered drugbank dataset:
        
https://docs.google.com/file/__d/0B3zcaLjQc0___yaEdsTUdxQ1ZJc28/edit?usp=__sharing
        
<https://docs.google.com/file/d/0B3zcaLjQc0_yaEdsTUdxQ1ZJc28/edit?usp=sharing>
        )

        I execute the query with the code given below:

        QueryExecution queryExecution =
        QueryExecutionFactory.__sparqlService("http://__localhost:8000/sparql/
        <http://localhost:8000/sparql/>", query);
        ResultSet resultSet = queryExecution.execSelect();
        while (resultSet.hasNext()) { resultSet.next(); ... }

        Reading the first result of two queries take similar times. But
        reading
        the all result set of the first query takes a long time. Do you
        have any
        idea why parsing two queries are different?


    They don't take the same time - queryExecution.execSelect does not
    complete the query - it sets it up.

    ARQ is streaming and does nto take a copy of the results on execSelect.
    It reads the results are pulled in hasNext.  The server is taking
    60s for the more expensive query presumably because it needs to scan
    a lot more results to filter out the down to 9000.  There are
    optimizations that you can do on ?x IN (...) (if that is what a
    one_of_filter is) but I don't know if 4Store does them, or can given
    it's architecture.

    Try this:

    ResultSet resultSet = queryExecution.execSelect();
    resultSet = ResultSetFactory.copyResult(__resultSet) ;

    to consume the data immediately before starting the loop.

             Andy




        Best Regards,
        Burak Yönyül

Re: Different parse times for similar result sets

Reply via email to