On 02/09/13 14:33, nadav hoze wrote:
Machine size: 12 GB
OS: Windows Server 2008 64 bit
I don't have much experience of Windows 64 bit and mmap files - you may
find running with 32 bit mode a useful datapoint (this does not use
memory mapped files which, from reading around the web, and anecdotal
evidence on users@, do not have the same benefits as on Linux).
VM: varies from client to client.
Does this mean that several VMs for running on the same 12G hardware?
If so, how much RAM is allocate to each VM?
data (in triples): 20,000,000 (3.6 GB)
Heap size: 2 GB
How big does the entire JVM process get? At that scale, the entire DB
should be mapped into memory
Driver program : ? (didn't understand)
You say the test program issuing TDB directly so it must be in the same
JVM.
It may be useful to you to run on native hardware to see what effect
VM's are having. It can range from no measurable effect to very
significant.
No the database is on a network shared drive (different server).
pattern matching (where clause):
Sorry - this is unreadable and being a partial extract, I can't reformat it.
Andy
*?ontologyConcept schema:code @concept.code^^xsd:string .*
*?ontologyConcept schema:codeSystemId @concept.codeSystemId^^xsd:string*
*OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
FILTER(!bound(?ontologyConceptDeleted) || (bound(?ontologyConceptDeleted)
&& ?ontologyConceptDeleted = false))*
*{*
* ?child relations:subClassOf ?ontologyConcept .*
* OPTIONAL{?child schema:isDeleted ?childDeleted}
FILTER(!bound(?childDeleted) || (bound(?childDeleted) && ?childDeleted =
false))*
* ?concept relations:equalsTo ?child .*
* OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
?conceptDeleted = false))*
* ?concept rdf:type schema:Concept*
*}*
*UNION*
*{*
* ?concept relations:equalsTo ?ontologyConcept .*
* ?concept rdf:type schema:Concept*
* OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
?conceptDeleted = false))*
*}*
basically all this big fuss is to find all child concepts of a specified
parent concept identified by concept.code and concept.codeSystemId.
so the @concept.code and @concept.codeSystemId you see are replaced in
runtime to actual values.
all of the optional sections you see are to ignore deleted (logically) or
not bound concepts.
Thanks,
Nadav
On Mon, Sep 2, 2013 at 4:14 PM, Andy Seaborne <[email protected]> wrote:
On 02/09/13 12:51, nadav hoze wrote:
hi,
We are doing stress tests to our service which it's underlying data layer
is jena TDB.
one of our tests is tor run heavy queries for long time (about 6 Hrs) and
afterwards run light queries. (we have clients which are in that mode).
What we witness is a huge performance degradation, light queries which
usually took around 0.1-0.2 sec after the heavy queries execution took
more
than 3 seconds.
Not surprising - the heavy queries will have taken over the OS
cache.(assuming 64 bit - a similar effect occurs on 32 bit). The
light-after-heavy is effectively running cold.
Also the heavy query execution had a huge performance degradation after
only one minute:
each heavy query fetched around 35000 triplets and for the first minutes
it took between 10-40 seconds (which is OK), afterwards it peaked to
200-8000 seconds.
Same thing memory wise, after a minute it peaked from 200mg to 2.2g.
What I would like to know is if there could be memory leak in jena, or
whether jena objects are cached in some way and maybe we can release them.
Here are important details for answering:
*jena version: 2.6.4*
*tdb version: 0.8.9*
*arq: 2.8.7*
*we use a single model and no datasets.*
Also can an upgrade to jena latest stable version help us here ?
You should upgrade anyway. There are bug fixes. And a different license.
Help is much appreciated :)
All depends on what the heavy query touches in the database (the pattern
matching part), the size of the machine, whether anything else is running
on the machine, ...
There are many, many factors:
What size of the machine?
What OS?
Is it a VM?
How much data (in triples) is there in the DB?
Heap size?
The driver program is on What
the same machine as the database - does this matter?
...
Andy
Thanks,
Nadav