Hi Paul,
> My question is: is total query time limited by search execution speed,
> or by marshaling and serialization of search results?
Costs are a bit of both but normally mainly query. It also depends on
the client processing.
Some context please:
1/ What's the storage layer?
2/ What result set format are you getting?
3/ How are you handling the results on receipt in the client?
(Håvard point about seeing data and query also applies)
The important point is that output is streamed.
Result sent while the query is execution; it is not the case that the
query executes,. all the results calculated and then results are produced.
To investigate, modify the query to do something like this
SELECT (count(*) AS ?C) { ... }
because then the result set cost is low and all the query is executed
before a result can be produced.
Andy
On 06/01/16 16:17, Paul Tyson wrote:
I have a modest (17M triple) dataset, fairly flat graph. I run some
queries selecting nodes with anywhere from 12-20 different property
values.
Result set counts are anywhere from 10,000 to 30,000 nodes. Total
execution time measured at client are in the 30-40 second range.
The web request begins streaming results immediately, but seems to take
longer than it should (based on the number of results and size of data
transfer). I also notice that the time is roughly linear with the size
of dataset--halving the dataset size halves the result set and the
execution time. I wouldn't have expected this behavior if all the time
was due to an indexed search.
My question is: is total query time limited by search execution speed,
or by marshaling and serialization of search results?
I have tried different query patterns, and believe I have the best
queries possible for the use case.
I'm looking for other suggestions to reduce overall execution time. The
performance does not improve drastically going from 4Gb to 8 or 16Gb
RAM. My test platforms are 64-bit Windows, ranging from small server
(16Gb RAM, 4 CPU) to laptops with 4Gb RAM.
Thanks,
--Paul