Re: Fuseki: strange (and disappointing) performance when compared to a simple servlet that calls ARQ

François-Paul Servant Mon, 07 Sep 2015 02:50:04 -0700

Andy,


> Le 5 sept. 2015 à 18:18, Andy Seaborne <[email protected]> a écrit :
> 
> On 05/09/15 16:19, François-Paul Servant wrote:
>>> Le 4 sept. 2015 à 10:21, Rob Vesse <[email protected]> a écrit :
>>> 
>>> You haven't shown your code so I can only guess at what may/may not be
>>> going on
>> 
>> Hi Rob,
>> 
>> note that while the difference in performance is surprising, and that the 
>> most plausible cause is an error on my side, I’m still concerned with 
>> fuseki’s performances: if it can do better with the queries I made, it 
>> doesn’t seem to be a viable solution for me. One of these queries is just 
>> part of what is displayed at:
>> http://www.semanlink.net/tag/linked_data.html
>> (developed years ago with jena)
>> and I won’t be able to use fuseki if the response time for such a query is 
>> is the range of seconds.
>> So I hope that the final answer will be: “here is how to use fuseki 
>> correctly, and then it will be fast” :-)
> 
> It is DESCRIBE that your figures point to not SELECT so let's focus on those.

OK.
Note however that, depending on the query, we may also have significant 
differences with select, cf.:
SELECT ?tag WHERE {
        ?tag skos:broader* tag:science.
}
SIMPLE FIST CALL: 0.172
SIMPLE MEAN: 0.0225
FUSEKI FIST CALL: 3.981
FUSEKI MEAN: 3.1274

> 
> Could you please run a profiler on fuseki and run some DESCRIBE tests?

yes I can, and I did, using jvisualvm (but it doesn’t work with too long 
queries). What do you want me to do exactly? I send some output in another 
message to your address

> 
> Also - Rob had some questions about the client-side handling of results that 
> are important here.

if I understood correctly, the point is to be sure that the client does read a 
complete answer. Here is the code that I use to read the data from one URI (I 
can send the complete test class if you want). 

/**
 * get uri and return the result as a string.
 * Increment time in chrono */
public static String getIt(String uri, Client client, MediaType mediaType, 
Chrono ch) {
                if (ch != null) ch.start();
                WebTarget webTarget = client.target(uri);
                Invocation.Builder invocationBuilder = 
webTarget.request(mediaType);
                invocationBuilder.header("Cache-Control", "no-cache");
                invocationBuilder.header("Pragma", "no-cache");
                
                Response response = invocationBuilder.get();
                int status = response.getStatus();
                if (status != 200) {
                        throw new RuntimeException("Unexpected status: " + 
status + " getting " + uri); // TODO
                }
                String s = response.readEntity(String.class);
                if (ch != null) ch.stop();
                return s;
}

When it is a rdf query, I then convert the string to a jena model, I check that 
it contains a decent number of triples, and I check that I get the same number 
of triples returned by my servlet and by fuseki (when the query is supposed to: 
not when it contains a limit clause)

fps


> 
>       Andy
> 
>> 
>> fps
>> 
>> 
>>> 
>>> Firstly did you actually consume the result set in your servlet?
>>> 
>>> A ResultSet is typically streamed so the fact that execSelect() returned
>>> doesn't mean the actual query was fully evaluated simply that the first
>>> result is available.  So if you did something like the following:
>>> 
>>> long start = System.currentTimeMillis();
>>> qe.execSelect()
>>> long elapsed = System.currentTimeMillis() - start;
>>> 
>>> Then all your have measured is the time to first solution not the time to
>>> get all results so if this is the case you need to ensure you fully
>>> consume the ResultSet somehow (whether by iterating over it, passing it to
>>> some IO method that writes it out, call ResultSetFormatter.consume() on it
>>> etc.) thus forcing ARQ to fully evaluate the query
>>> 
>>> On the point of IO, did your servlet actually write the results back to
>>> the client since depending on the size of the results that can add
>>> significant overhead relative to the actual query execution and Fuseki is
>>> always going to do this.
>>> 
>>> Finally most of the queries exhibiting large differences are DESCRIBE
>>> queries which are two pass evaluation, firstly the WHERE clause is
>>> evaluated (via execSelect() internally) and then the description is built.
>>> If your servlet is only calling execSelect() for those queries then it is
>>> only timing the first pass of the WHERE clause (and possibly subject to
>>> timing only the first result as noted above) rather than timing the full
>>> query evaluation which Fuseki will be doing.
>>> 
>>> Rob
>>> 
>>> On 03/09/2015 23:19, "François-Paul Servant"
>>> <[email protected]> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> shouldn’t we have the same level of performance with Fuseki and with a
>>>> simple servlet that calls ARQ?
>>>> 
>>>> I hadn’t try fuseki until now. Yesterday, I downloaded the 2.3.0 release,
>>>> started the server in a terminal window of my mac (osx 10.10.5) with:
>>>> ./fuseki-server --mem /ds
>>>> I uploaded a rdf file (skos-like data, 21K triples), and I began to make
>>>> some queries. I’m used to play with that data in jena memory models, and
>>>> to query it. Getting results in Fuseki GUI seemed slow to me, I decided
>>>> to compare with a simple servlet that loads a memory model with the same
>>>> data on init, and calls ARQ in its doGet method.
>>>> 
>>>> I loaded both fuseki and my simple servlet in an instance of tomcat 8,
>>>> both loaded with the same data (default graph, memory model), and I
>>>> measured the time for some GET queries as seen by a client I wrote using
>>>> jersey.
>>>> 
>>>> Here are the results. For each sparql query, times with the simple
>>>> servlet, and with fuseki: the time for the first call, and the mean when
>>>> calling it 10 times (with the simple servlet, it is generally much faster
>>>> after the first call, but this is not related to HTTP caching: I took
>>>> attention to it, and I verified, in the case of the simple servlet, that
>>>> its doGet method gets actually called)
>>>> Depending on the query, differences are small, or huge.
>>>> 
>>>> PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
>>>> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
>>>> SELECT ?tag WHERE {
>>>>    ?tag skos:broader tag:semantic_web.
>>>> }
>>>> SIMPLE FIST CALL: 0.039
>>>> SIMPLE MEAN: 0.0213
>>>> FUSEKI FIST CALL: 0.025
>>>> FUSEKI MEAN: 0.0215
>>>> 
>>>> PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
>>>> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
>>>> DESCRIBE ?tag WHERE {
>>>>    ?tag skos:broader tag:afrique.
>>>> }
>>>> SIMPLE FIST CALL: 0.039
>>>> SIMPLE MEAN: 0.0216
>>>> FUSEKI FIST CALL: 0.485
>>>> FUSEKI MEAN: 0.2284
>>>> 
>>>> PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
>>>> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
>>>> SELECT ?tag WHERE {
>>>>    ?tag skos:broader* tag:science.
>>>> }
>>>> SIMPLE FIST CALL: 0.172
>>>> SIMPLE MEAN: 0.0225
>>>> FUSEKI FIST CALL: 3.981
>>>> FUSEKI MEAN: 3.1274
>>>> 
>>>> PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
>>>> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
>>>> DESCRIBE ?tag WHERE {
>>>>    ?tag skos:broader* tag:linked_data.
>>>> }
>>>> SIMPLE FIST CALL: 0.131
>>>> SIMPLE MEAN: 0.0417
>>>> FUSEKI FIST CALL: 1.46
>>>> FUSEKI MEAN: 1.3244
>>>> 
>>>> PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
>>>> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
>>>> SELECT ?tag WHERE {
>>>>    ?tag a <http://www.semanlink.net/2001/00/semanlink-schema#Tag>.
>>>> }
>>>> LIMIT 1000
>>>> SIMPLE FIST CALL: 0.07
>>>> SIMPLE MEAN: 0.0269
>>>> FUSEKI FIST CALL: 0.037
>>>> FUSEKI MEAN: 0.024399999999999998
>>>> 
>>>> PREFIX tag: <http://127.0.0.1:8080/fuseki/ds/>
>>>> PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
>>>> DESCRIBE ?tag WHERE {
>>>>    ?tag a <http://www.semanlink.net/2001/00/semanlink-schema#Tag>.
>>>> }
>>>> LIMIT 1000
>>>> SIMPLE FIST CALL: 0.181
>>>> SIMPLE MEAN: 0.13440000000000002
>>>> FUSEKI FIST CALL: 6.471
>>>> FUSEKI MEAN: 5.497999999999999
>>>> 
>>>> Do you have an explanation?
>>>> 
>>>> Best Regards,
>>>> 
>>>> fps
>>> 
>>> 
>>> 
>>> 
>> 
>

Re: Fuseki: strange (and disappointing) performance when compared to a simple servlet that calls ARQ

Reply via email to