Re: Heavy queries followed by light queries

Andy Seaborne Tue, 03 Sep 2013 08:14:13 -0700

On 03/09/13 07:01, nadav hoze wrote:

1. Regarding VM when I said varies from client to client I meant thatsome uses VM and some don't but the 12GB is always for a single machine.Also forgot to state that of course other processes works on thatmachine beside this service that uses jena, but this service get hisshared part and I don't think it's a lack of resources issue.

What I have seen happening on other systems is that the VM configurationis limiting the growth of the VM, causing it to not use as much of themachine as it might.


Can you see that the whole 12G is being used at all?

Network drives don't help.

2. about the matching pattern here it is again, hopes it's OK now (Ialso attached it):


This :
FILTER NOT EXISTS { ?ontologyConcept schema:isDeleted true }

is better than:

OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}

FILTER(!bound(?ontologyConceptDeleted) ||(bound(?ontologyConceptDeleted) && ?ontologyConceptDeleted = false))

Just a short explanation before you read the matching pattern:
this query should fetch all the triplets with relation subClassOf to agiven ontologyConcept. it's identifiers are @concept.code and@concept.codeSystemId which are basically placeholders which wereplace in our service.The OPTIONAL parts you see in the query are for ignoring conceptswhich are marked as deleted or not bound to the schema.
?ontologyConcept schema:code @concept.code^^xsd:string .
?ontologyConcept schema:codeSystemId @concept.codeSystemId^^xsd:string
OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}FILTER(!bound(?ontologyConceptDeleted) ||(bound(?ontologyConceptDeleted) && ?ontologyConceptDeleted = false))
{
?child relations:subClassOf ?ontologyConcept .
OPTIONAL{?child schema:isDeleted ?childDeleted}FILTER(!bound(?childDeleted) || (bound(?childDeleted) && ?childDeleted= false))
?concept relations:equalsTo ?child .
OPTIONAL{?concept schema:isDeleted ?conceptDeleted}FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&?conceptDeleted = false))
?concept rdf:type schema:Concept
}
UNION
{
?concept relations:equalsTo ?ontologyConcept .
?concept rdf:type schema:Concept
OPTIONAL{?concept schema:isDeleted ?conceptDeleted}FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&?conceptDeleted = false))
}
3. About the direct mode, we already use it so no effect there, isthere a way to clear the memory cache from the model ?

No but I doubt it would make much difference. If you clear the cache,there machine has to go to disk to fetch the data just as if it's doingcache replacement.



Thanks,

Nadav

On Mon, Sep 2, 2013 at 6:21 PM, Andy Seaborne <[email protected]<mailto:[email protected]>> wrote:


    On 02/09/13 14:33, nadav hoze wrote:

        Machine size: 12 GB
        OS: Windows Server 2008 <tel:2008> 64 bit


    I don't have much experience of Windows 64 bit and mmap files -
    you may find running with 32 bit mode a useful datapoint (this
    does not use memory mapped files which, from reading around the
    web, and anecdotal evidence on users@, do not have the same
    benefits as on Linux).


        VM: varies from client to client.


    Does this mean that several VMs for running on the same 12G hardware?
    If so, how much RAM is allocate to each VM?


        data (in triples): 20,000,000 (3.6 GB)
        Heap size: 2 GB


    How big does the entire JVM process get?  At that scale, the
    entire DB should be mapped into memory


        Driver program : ? (didn't understand)


    You say the test program issuing TDB directly so it must be in the
    same JVM.

    It may be useful to you to run on native hardware to see what
    effect VM's are having.  It can range from no measurable effect to
    very significant.


        No the database is on a network shared drive (different server).

        pattern matching (where clause):


    Sorry - this is unreadable and being a partial extract, I can't
    reformat it.

            Andy

        *?ontologyConcept schema:code @concept.code^^xsd:string .*
        *?ontologyConcept schema:codeSystemId
        @concept.codeSystemId^^xsd:string*
        *OPTIONAL{?ontologyConcept schema:isDeleted
        ?ontologyConceptDeleted}
        FILTER(!bound(?ontologyConceptDeleted) ||
        (bound(?ontologyConceptDeleted)
        && ?ontologyConceptDeleted = false))*
        *{*
        * ?child relations:subClassOf ?ontologyConcept .*
        * OPTIONAL{?child schema:isDeleted ?childDeleted}

        FILTER(!bound(?childDeleted) || (bound(?childDeleted) &&
        ?childDeleted =
        false))*
        * ?concept relations:equalsTo ?child .*
        * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
        FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
        ?conceptDeleted = false))*
        * ?concept rdf:type schema:Concept*
        *}*
        *UNION*
        *{*
        * ?concept relations:equalsTo ?ontologyConcept .*
        * ?concept rdf:type schema:Concept*
        * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
        FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
        ?conceptDeleted = false))*
        *}*


        basically all this big fuss is to find all child concepts of a
        specified
        parent concept identified by concept.code and
        concept.codeSystemId.
        so the  @concept.code and  @concept.codeSystemId you see are
        replaced in
        runtime to actual values.
        all of the optional sections you see are to ignore deleted
        (logically) or
        not bound concepts.

        Thanks,

        Nadav

        On Mon, Sep 2, 2013 <tel:2013> at 4:14 PM, Andy Seaborne
        <[email protected] <mailto:[email protected]>> wrote:

            On 02/09/13 12:51, nadav hoze wrote:

                hi,

                We are doing stress tests to our service which it's
                underlying data layer
                is jena TDB.
                one of our tests is tor run heavy queries for long
                time (about 6 Hrs) and
                afterwards run light queries. (we have clients which
                are in that mode).
                What we witness is a huge performance degradation,
                light queries which
                usually took around 0.1-0.2 sec after the heavy
                queries execution took
                more
                than 3 seconds.


            Not surprising - the heavy queries will have taken over the OS
            cache.(assuming 64 bit - a similar effect occurs on 32
            bit).  The
            light-after-heavy is effectively running cold.

              Also the heavy query execution had a huge performance
            degradation after

                only one minute:
                each heavy query fetched around  35000 triplets and
                for the first minutes
                it took between 10-40 seconds (which is OK),
                afterwards it peaked to
                200-8000 seconds.
                Same thing memory wise, after a minute it peaked from
                200mg to 2.2g.

                What I would like to know is if there could be memory
                leak in jena, or
                whether jena objects are cached in some way and maybe
                we can release them.

                Here are important details for answering:
                *jena version: 2.6.4*
                *tdb version: 0.8.9*
                *arq: 2.8.7*
                *we use a single model and no datasets.*


                Also can an upgrade to jena latest stable version help
                us here ?


            You should upgrade anyway. There are bug fixes.  And a
            different license.



                Help is much appreciated :)


            All depends on what the heavy query touches in the
            database (the pattern
            matching part), the size of the machine, whether anything
            else is running
            on the machine, ...

            There are many, many factors:

            What size of the machine?
            What OS?
            Is it a VM?
            How much data (in triples) is there in the DB?
            Heap size?
            The driver program is on What
            the same machine as the database - does this matter?
            ...

                     Andy


              Thanks,


                Nadav

Re: Heavy queries followed by light queries

Reply via email to