The optimizer likes the query and wil execute in a streaming fashion (which means the memory footprint is fixed and unrelated to the data size, only the query size) except for the DISTINCT.

The query may have high fan-out effects leading to many partial duplicates

e.g. if you have

OPTIONAL{ ?s :p ?v1 }
OPTIONAL{ ?s :q ?v2 }

and there are 5 :p per ?s and 2 :q per ?s, then that pair of OPTIONALs generates 10 different rows (the cross product of ?v1 and ?v2 matches)

You say you read the model into memory. Together with the fact that the out-of-memory condition is happening in different places

It will depend on what 'ontoIn' is - what sort of model is it? An OntModel? And what's the base data stored in? TDB? Memory?

Could you try replacing the SELECT clause with

SELECT (count(*) AS ?c)

and say what the value of ?c is.

        Andy

On 11/06/13 10:05, Brice Sommacal wrote:
Hello Andy,

The query is generated from a XML file.
Once the model is read and available in memory, we create  a XML file
for each OWL class with all their properties. (see attached for an example).
Then, from this XML file, we generate a SPARQL SELECT query like below
and save the results in a XML format.
Finally, we apply a XSL transformation to convert the XML file in a JSON
format.
Hope it's enough clear.

For the time being, I'm going to start an analysis about directly
populate the Exhibit 3 Staged storage mode without using the XML file.
(to convert in JSON and use the Exhibit 3 scripted storage mode in JSON)

Regards,


Brice

He is the query:

PREFIX : <http://seamless.pco-innovation.com/energy/common/software/tcua#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX afn: <http://jena.hpl.hp.com/ARQ/function#>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>

SELECT DISTINCT ?label ?type ?id ?uri ?tctype__tctypecstattach
?tctype__tcproperty ?tcstdtype__tcclass ?parentTypeName
?inverse_of_primaryTypeName ?typeName ?isAbstract
?tctype__constantattach ?childTypeName ?tctype__tcdisplayrule
?description ?inverse_of_secondaryTypeName ?tctype__tcgrmrule
?sourceTypeName__tccomprule ?tctyper__tcdeepcprule
?destTypeName__tccomprule ?tctype__tcdeepcprule ?tctypeo__tcdeepcprule
?noteTypeName
WHERE{
?instance rdf:type
<http://seamless.pco-innovation.com/energy/common/software/tcu83#TCSTANDARDTYPE>.
?instance rdf:type ?typeTemp.
LET(?type := afn:localname(?typeTemp)).
?instance :inferredLabel ?label.
LET(?id := afn:localname(?instance)).
LET(?uri := fn:concat('../mf/MF.html?graphName=TCSTANDARDTYPE&QName=' ,
afn:localname(?instance))).
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__tctypecstattach>
?tctype__tctypecstattachNode.
?tctype__tctypecstattachNode :inferredLabel
?labeltctype__tctypecstattachNode.
LET(?tctype__tctypecstattach := str(?labeltctype__tctypecstattachNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__tcproperty>
?tctype__tcpropertyNode.
?tctype__tcpropertyNode :inferredLabel ?labeltctype__tcpropertyNode.
LET(?tctype__tcproperty := str(?labeltctype__tcpropertyNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tcstdtype__tcclass>
?tcstdtype__tcclassNode.
?tcstdtype__tcclassNode :inferredLabel ?labeltcstdtype__tcclassNode.
LET(?tcstdtype__tcclass := str(?labeltcstdtype__tcclassNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#parentTypeName>
?parentTypeNameNode.
?parentTypeNameNode :inferredLabel ?labelparentTypeNameNode.
LET(?parentTypeName := str(?labelparentTypeNameNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#inverse_of_primaryTypeName>
?inverse_of_primaryTypeNameNode.
?inverse_of_primaryTypeNameNode :inferredLabel
?labelinverse_of_primaryTypeNameNode.
LET(?inverse_of_primaryTypeName :=
str(?labelinverse_of_primaryTypeNameNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#typeName>
?typeNameTemp.
LET(?typeName := str(?typeNameTemp)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#isAbstract>
?isAbstractTemp.
LET(?isAbstract := str(?isAbstractTemp)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__constantattach>
?tctype__constantattachNode.
?tctype__constantattachNode :inferredLabel ?labeltctype__constantattachNode.
LET(?tctype__constantattach := str(?labeltctype__constantattachNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#childTypeName>
?childTypeNameNode.
?childTypeNameNode :inferredLabel ?labelchildTypeNameNode.
LET(?childTypeName := str(?labelchildTypeNameNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__tcdisplayrule>
?tctype__tcdisplayruleNode.
?tctype__tcdisplayruleNode :inferredLabel ?labeltctype__tcdisplayruleNode.
LET(?tctype__tcdisplayrule := str(?labeltctype__tcdisplayruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#description>
?descriptionTemp.
LET(?description := str(?descriptionTemp)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#inverse_of_secondaryTypeName>
?inverse_of_secondaryTypeNameNode.
?inverse_of_secondaryTypeNameNode :inferredLabel
?labelinverse_of_secondaryTypeNameNode.
LET(?inverse_of_secondaryTypeName :=
str(?labelinverse_of_secondaryTypeNameNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__tcgrmrule>
?tctype__tcgrmruleNode.
?tctype__tcgrmruleNode :inferredLabel ?labeltctype__tcgrmruleNode.
LET(?tctype__tcgrmrule := str(?labeltctype__tcgrmruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#sourceTypeName__tccomprule>
?sourceTypeName__tccompruleNode.
?sourceTypeName__tccompruleNode :inferredLabel
?labelsourceTypeName__tccompruleNode.
LET(?sourceTypeName__tccomprule :=
str(?labelsourceTypeName__tccompruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctyper__tcdeepcprule>
?tctyper__tcdeepcpruleNode.
?tctyper__tcdeepcpruleNode :inferredLabel ?labeltctyper__tcdeepcpruleNode.
LET(?tctyper__tcdeepcprule := str(?labeltctyper__tcdeepcpruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#destTypeName__tccomprule>
?destTypeName__tccompruleNode.
?destTypeName__tccompruleNode :inferredLabel
?labeldestTypeName__tccompruleNode.
LET(?destTypeName__tccomprule := str(?labeldestTypeName__tccompruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctype__tcdeepcprule>
?tctype__tcdeepcpruleNode.
?tctype__tcdeepcpruleNode :inferredLabel ?labeltctype__tcdeepcpruleNode.
LET(?tctype__tcdeepcprule := str(?labeltctype__tcdeepcpruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#tctypeo__tcdeepcprule>
?tctypeo__tcdeepcpruleNode.
?tctypeo__tcdeepcpruleNode :inferredLabel ?labeltctypeo__tcdeepcpruleNode.
LET(?tctypeo__tcdeepcprule := str(?labeltctypeo__tcdeepcpruleNode)).
}
OPTIONAL{
?instance
<http://seamless.pco-innovation.com/energy/common/software/tcu83#noteTypeName>
?noteTypeNameTemp.
LET(?noteTypeName := str(?noteTypeNameTemp)).
}
}


2013/6/7 Andy Seaborne <[email protected] <mailto:[email protected]>>

    Brice,

    What's the query?

             Andy


    On 07/06/13 08:52, Brice Sommacal wrote:

        Hello,

        The preceding error (XSLTransformation) was occuring in my Eclipse
        environment (set with Xmx and Xms at 1024M).
        When I move my code in a web server environment (set with Xmx
        and Xms at
        6000M), the XSL transformation goes well, but I keep tracking a
        Java Heap
        Space error:

        java.lang.OutOfMemoryError: Java heap space
        at java.util.ArrayList.<init>(__ArrayList.java:112)
        at java.util.ArrayList.<init>(__ArrayList.java:119)
           at org.apache.jena.atlas.lib.DS.__list(DS.java:54)
        at
        
org.apache.jena.atlas.__iterator.IteratorConcat.<init>__(IteratorConcat.java:34)
           at
        
org.apache.jena.atlas.__iterator.IteratorConcat.__concat(IteratorConcat.java:45)
        at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
           at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
        at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
           at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
        at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
           at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
        at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
           at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:77)
        at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingProjectBase.__actualVars(BindingProjectBase.__java:79)
           at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingProjectBase.__vars1(BindingProjectBase.java:__71)
        at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:75)
           at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingProjectBase.__actualVars(BindingProjectBase.__java:79)
        at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingProjectBase.__vars1(BindingProjectBase.java:__71)
           at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.vars(__BindingBase.java:75)
        at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.hashCode(__BindingBase.java:199)
           at
        
com.hp.hpl.jena.sparql.engine.__binding.BindingBase.hashCode(__BindingBase.java:185)
        at java.util.HashMap.put(HashMap.__java:372)
           at java.util.HashSet.add(HashSet.__java:200)
        at
        org.apache.jena.atlas.data.__SortedDataBag.add(__SortedDataBag.java:114)
           at
        
org.apache.jena.atlas.data.__DistinctDataNet.netAdd(__DistinctDataNet.java:58)
        at
        
com.hp.hpl.jena.sparql.engine.__iterator.QueryIterDistinct.__isFreshSighting(__QueryIterDistinct.java:66)
           at
        
com.hp.hpl.jena.sparql.engine.__iterator.__QueryIterDistinctReduced.__hasNextBinding(__QueryIterDistinctReduced.java:__61)
        at
        
com.hp.hpl.jena.sparql.engine.__iterator.QueryIteratorBase.__hasNext(QueryIteratorBase.__java:112)
           at
        
com.hp.hpl.jena.sparql.engine.__iterator.QueryIteratorWrapper.__hasNextBinding(__QueryIteratorWrapper.java:40)
        at
        
com.hp.hpl.jena.sparql.engine.__iterator.QueryIteratorBase.__hasNext(QueryIteratorBase.__java:112)
           at
        
com.hp.hpl.jena.sparql.engine.__iterator.QueryIteratorWrapper.__hasNextBinding(__QueryIteratorWrapper.java:40)
        at
        
com.hp.hpl.jena.sparql.engine.__iterator.QueryIteratorBase.__hasNext(QueryIteratorBase.__java:112)

        Definetely, the XML serialization is not good enough for my use
        case.

        What should be the best solution?
        <quote>
           - Read data from a RDF Store (Jena TBD, Sesame) and return
        data with a
        SPARQL end point (and apply the XSL on the fly [streaming])
           - Convert data from OWL files into an Exhibit table (staged
        mode). So
        let's directly parameter the Exhibit storage mode.
                          (by the way, I didn't succeed to set up
        Exhibit 3 staged in
        a windows environment yet)
        - Read data from a RDF Store and create a specific connector
        with Exhibit
        API?
        </quote>

        Regards,


        Brice


        2013/6/6 Brice Sommacal <[email protected]
        <mailto:[email protected]>>

            Hi Andy,

            I was using Jena 2.6.4 and I have just upgraded to 2.10.1..
            The logs are:
            Exception in thread "main" java.lang.OutOfMemoryError: Java
            heap space
               at java.util.Arrays.copyOf(__Unknown Source)
            at java.util.Arrays.copyOf(__Unknown Source)
            at java.util.Vector.__ensureCapacityHelper(Unknown Source)
               at java.util.Vector.addElement(__Unknown Source)
            at
            
com.sun.org.apache.xml.__internal.dtm.ref.sax2dtm.__SAX2DTM2.startElement(Unknown
            Source)
               at
            
com.sun.org.apache.xalan.__internal.xsltc.dom.SAXImpl.__startElement(Unknown
            Source)
            at
            
com.sun.org.apache.xalan.__internal.xsltc.trax.__TransformerHandlerImpl.__startElement(Unknown
            Source)
               at
            org.apache.xerces.parsers.__AbstractSAXParser.__startElement(Unknown
            Source)
            at
            
org.apache.xerces.impl.__XMLNSDocumentScannerImpl.__scanStartElement(Unknown
            Source)
               at
            
org.apache.xerces.impl.__XMLDocumentFragmentScannerImpl__$FragmentContentDispatcher.__dispatch(Unknown
            Source)
            at
            
org.apache.xerces.impl.__XMLDocumentFragmentScannerImpl__.scanDocument(Unknown
            Source)
               at
            org.apache.xerces.parsers.__XML11Configuration.parse(__Unknown
            Source)
            at
            org.apache.xerces.parsers.__XML11Configuration.parse(__Unknown
            Source)
               at org.apache.xerces.parsers.__XMLParser.parse(Unknown
            Source)
            at
            org.apache.xerces.parsers.__AbstractSAXParser.parse(__Unknown 
Source)
               at
            org.apache.xerces.jaxp.__SAXParserImpl$JAXPSAXParser.__parse(Unknown
            Source)
            at
            
com.sun.org.apache.xalan.__internal.xsltc.trax.__TrAXFilter.parse(Unknown
            Source)
               at
            
com.sun.org.apache.xalan.__internal.xsltc.trax.__TransformerImpl.__transformIdentity(Unknown
            Source)
            at
            
com.sun.org.apache.xalan.__internal.xsltc.trax.__TransformerImpl.transform(__Unknown
            Source)
               at
            
com.sun.org.apache.xalan.__internal.xsltc.trax.__TransformerImpl.transform(__Unknown
            Source)
            at
            
com.pcoinnovation.__genericbrowser.json.FiltreXSL.__transformer(FiltreXSL.java:47)

            So, from now, it's not because of the ResultSetFormatter but
            from the XSL
            Transformation with SAX.
            Thanks Andy for pointing this out.

            There is no parralel requests because I'm executing them one
            by one, and
            close the query every time.



            2013/6/6 Andy Seaborne <[email protected]
            <mailto:[email protected]>>

                On 06/06/13 13:52, Brice Sommacal wrote:

                    The XML processing is inside the class
                    ResultSetFormatter available from
                    Jena API. I'm not sure if it's parse with XML DOM or
                    SAX.

                    Logs are here :
                               at
                    
org.openjena.atlas.io.**__IndentedWriter.write(**__IndentedWriter.java:128)
                    at
                    org.openjena.atlas.io.**__IndentedWriter.printOneChar(**
                    IndentedWriter.java:123)
                        at org.openjena.atlas.io.**__IndentedWriter.print(**
                    IndentedWriter.java:87)
                        at
                    
com.hp.hpl.jena.sparql.**__resultset.XMLOutputResultSet.*__*printLiteral(**
                    XMLOutputResultSet.java:182)
                    at
                    com.hp.hpl.jena.sparql.**__resultset.XMLOutputResultSet.*__*
                    printBindingValue(**__XMLOutputResultSet.java:148)
                        at
                    com.hp.hpl.jena.sparql.**__resultset.XMLOutputResultSet.*__*
                    binding(XMLOutputResultSet.**__java:132)


                    Jena API provide way to add the stylesheet inside
                    the XML (xsl:reference)
                    but not to directly run the XML with the XSL.
                    That's wy I firstly write the XML file (a result set
                    serialization) , and
                    then run a SAX processor with a stylesheet. The
                    output is a JSON file.


                (version? it's not the current one)

                The ResultSet writing is streaming and not RAM limited.
                  It does not use
                SAX or DOM, it just writes direct output.  The query may
                be consuming
                space, some queries do, especially if inferencing is
                involved (ontoIn
                suggests it might be) and this just happens to be where
                the heap limit is
                hit.

                Processing the XML output may well be memory consuming
                but that's not
                Jena.

                Are there parallel requests going on?  They all compete
                for RAM.

                          Andy




                    Brice


                    2013/6/6 Claude Warren <[email protected]
                    <mailto:[email protected]>>

                       I have not followed this discussion very closely
                    so please excuse any

                        items
                        that have already been discussed.

                        You state you are serializing the result set to
                        XML apply a style sheet
                        and
                        output as json.

                        Does your XML processing use the XML Dom or SAX
                        processor?  (DOM
                        results in
                        a memory footprint of approx 3x document size)
                          You can run the Style
                        sheet
                        processing directly agains the SAX processor and
                        have a minimal
                        footprint.

                        Does your stylesheet output the JSON or do you
                        use an XML to JSON
                        converter?  If the latter, is does it use or can
                        it use streaming like
                        the
                        SAX parser does?

                        Claude


                        On Thu, Jun 6, 2013 at 1:28 PM, Brice Sommacal <
                        [email protected]
                        <mailto:[email protected]>

                            wrote:


                           Hi Olivier,


                            Thanks for the tips for using your library.
                            It may be useful one day.
                            Can I have a look at it? I'm wondering how
                            the n3 graph is read (from a
                            file?)
                            Is it possible to manage an other data
                            source from? like a RDF Store?

                            For my case, my code is inside a java
                            servlet and I don't manage to set

                        up

                            the application with data from a IHM. So
                            there is no way to use a
                            javascript library (not yet ;-))

                            Thanks anyway,


                            Brice


                            2013/6/5 Olivier Rossel
                            <[email protected]
                            <mailto:[email protected]>>

                               i have a small javascript that converts a
                            n3 graph into a javascript


                            graph

                                of objects.
                                if your problem is related to XML stuff
                                and such a lib could help, let

                            me


                            know.

                                (it might be interesting to contribute
                                it directly to exhibit, btw)


                                On Wed, Jun 5, 2013 at 6:13 PM, Brice
                                Sommacal <

                            [email protected]
                            <mailto:[email protected]>


                               wrote:



                                   Hello everyone,


                                    I'm facing a
                                    "java.lang.OutOfMemoryError: GC
                                    overhead limit exceeded"

                                error

                                    and I would like an advice about how
                                    I could optimize my code.

                                    The aim of this method is to run a
                                    SPARQL query, convert it on a XML

                                format

                                    and then apply a XSL stylesheet[1]
                                    to write a JSON format (readable

                                by


                               Exhibit - Scripted [2]).


                                        My piece of code was working
                                    well untill today. (I have been trying

                                to


                               query a big model and the query returns
                            too much results).

                                    This makes my program break.

                                    <quote>
                                    Query queryToExec =
                                    QueryFactory.create(query,
                                    Syntax.syntaxARQ);
                                    QueryExecution qexec =
                                    
QueryExecutionFactory.create(*__*queryToExec,

                                ontoIn);


                                    ResultSet result = null;

                                    BufferedOutputStream buf;
                                    try{
                                        result = qexec.execSelect();
                                    buf = new BufferedOutputStream(new
                                    FileOutputStream(new File(root +
                                    "XML/JSON_XML/"+qNameClass+".*__*xml")));
                                        //Serialization of the resultSet
                                    ResultSetFormatter.**__outputAsXML(buf,
                                    result);

                                    buf.close();
                                        }
                                    catch (Exception e) {
                                    e.printStackTrace();
                                        }
                                    finally{
                                    qexec.close();
                                    }
                                    </quote>

                                    I know that writing XML file use
                                    loads memory....

                                    I was thinking of:
                                        - creating several XML files by
                                    tracing the ResullSetFormatter

                                memory


                               usage. (is there possible?)

                                        - avoiding XML intermediate
                                    format and write directly in one or

                                several


                                JSON file...

                                        - ...


                                         Is there someone whom find a
                                    way to avoid this kind of error


                                    (without


                                increasing Xms Xmx) ??


                                    Thanks in advance,


                                    Brice

                                    [1]
                                    
http://data-gov.tw.rpi.edu/**__wiki/Sparqlxml2exhibitjson.xsl
                                    
<http://data-gov.tw.rpi.edu/**wiki/Sparqlxml2exhibitjson.xsl>__<http://data-gov.tw.rpi.edu/__wiki/Sparqlxml2exhibitjson.xsl
                                    
<http://data-gov.tw.rpi.edu/wiki/Sparqlxml2exhibitjson.xsl>__>
                                    [2]
                                    http://www.simile-widgets.org/__**exhibit3/
                                    
<http://www.simile-widgets.org/**exhibit3/><http://www.simile-__widgets.org/exhibit3/
                                    <http://www.simile-widgets.org/exhibit3/>>






                        --
                        I like: Like Like - The likeliest place on the web<
                        http://like-like.xenei.com>
                        Identity:
                        https://www.identify.nu/user.*__*[email protected]
                        
<https://www.identify.nu/user.**[email protected]><https://__www.identify.nu/[email protected]
                        <https://www.identify.nu/[email protected]>>
                        LinkedIn:
                        http://www.linkedin.com/in/**__claudewarren
                        
<http://www.linkedin.com/in/**claudewarren><http://www.__linkedin.com/in/claudewarren
                        <http://www.linkedin.com/in/claudewarren>>









Reply via email to