Dan Brickley wrote:
On 6/3/09 10:21, Yrjänä Rankka wrote:
Georgi Kobilarov wrote:
Hi Kingsley,

DESCRIBE <http://dbpedia.org/resource/London> takes 3 minutes to
execute on lod.openlinksw.com ...

It took only a few seconds when I tried it. Takes time to warm up a pan
of this size, as is the case with any DBMS. As the working set
stabilizes in memory, results will come faster.

What's the granularity of the warmup? If eg /resource/Paris hasn't been directly viewed, will it benefit much from general warmup of related resources that are mentioned in the queries for that entity?

Very likely so. Also in case of DESCRIBE <http://dbpedia.org/resource/London> the result of ~ 13MB takes a while to transfer as well. Though not quite 3 minutes - at least not through the pipe I'm connected to.

Here's the explanation of how the read-ahead works straight from the horse's mouth:

In general, looking for resources in a data set improves the working set for that data set. There is some locality based on load order etc.

The disk format is 8K pages, 256 pages per extent of 2MB. It is 8 disks and 16 server processes, so disk is too narrow. Disk reads are in general in parallel on all disks.

The random access transfer unit is 8K but if you get two reads hitting the same extent within a second of each other, the whole extent is read sequentially instead of the 2^nd single page request. So frequency of access drives bulk prefetching. Then there is cache maintenance policies that differ between just prefetched and actually requested pages. This is a tunable tradeoff between disk throughput and cache pollution.

Virtuoso IO is clever enough. But the fact is that running from memory is 1000+ times faster than from disk on a random access workload and RDF is the very essence of random access.


cheers

Dan

Yrjänä

--
Yrjana Rankka            | [email protected]
Developer, Virtuoso Team | http://www.openlinksw.com
| Making Technology Work For You

Reply via email to