[Simile-Widgets] Exhibit 3/ RDF Store - read data source [WAS: GC limit using ResultSet.outputAsXML - way to optimize my code?]

Brice Sommacal Thu, 06 Jun 2013 05:31:38 -0700

Hi Rob, and all Exhibit experts,

@Rob : Thanks for explaining how ReseultFormatter is working. But I'm not
sure that my question is related about only how write SPARQL query results
in a XML format.


My goal is to provide a faceted view (within Exhibit) of all individuals
from a knowledge base (data source format = OWL) ordered by class:
 - Read a jena model from OWL file(s)
 - Find all the classes (OWL Classes)
 - For each of thses classes, make a SPARQL query to extract all properties
values (object and datatype properties)
 - Save the results in a XML file
 - Apply a XSL Transformation to return a JSON file
 - Execute a filter from the temp JSON file to return an other JSON file
formatted as expected by Exhibit 3 (scripted)

Lately, I had to read a model about 300000 triples and for a certain class,
the results set (obtained by a SPARQL Select) didn't succeed to by written
in a XML file with JENA API. ( error java.lang.OutOfMemoryError: GC
overhead limit exceeded)[1]
I don't want anymore to allocate more spaces to my JVM (Xmx and Xms are
already fixed in Tomcat).

By the way, all this code is executed in a java servlet. The data are read
from a data source folder before executing Tomcat.

So, I wonder if it should be better to:
 - Read data from a RDF Store (Jena TBD, Sesame) and return data with a
SPARQL end point (and apply the XSL on the fly [streaming])
 - Convert data from OWL files into an Exhibit table (staged mode). So
let's directly parameter the Exhibit storage mode.
                (by the way, I didn't succeed to set up Exhibit 3 staged in
a windows environment yet)
- Read data from a RDF Store and create a specific connector with Exhibit
API?

Well, I hope some of you guys could share your advices about that thing
because I don't know what the solution would be the best for my needs.

Regards,


Brice

 [1] java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.nio.CharBuffer.wrap(CharBuffer.java:350)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:246)
 at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:94)
 at java.io.OutputStreamWriter.write(OutputStreamWriter.java:177)
at java.io.PrintWriter.write(PrintWriter.java:361)
 at org.openjena.atlas.io.IndentedWriter.write(IndentedWriter.java:128)
at
org.openjena.atlas.io.IndentedWriter.printOneChar(IndentedWriter.java:123)
 at org.openjena.atlas.io.IndentedWriter.print(IndentedWriter.java:87)
at
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printLiteral(XMLOutputResultSet.java:182)
 at
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.printBindingValue(XMLOutputResultSet.java:148)
at
com.hp.hpl.jena.sparql.resultset.XMLOutputResultSet.binding(XMLOutputResultSet.java:132)
 at
com.hp.hpl.jena.sparql.resultset.ResultSetApply.apply(ResultSetApply.java:37)
at com.hp.hpl.jena.sparql.resultset.XMLOutput.format(XMLOutput.java:39)
 at
com.hp.hpl.jena.query.ResultSetFormatter.outputAsXML(ResultSetFormatter.java:427)
at
com.hp.hpl.jena.query.ResultSetFormatter.outputAsXML(ResultSetFormatter.java:405)
 at
com.pcoinnovation.genericbrowser.json.ParaFileToJSON.<init>(ParaFileToJSON.java:163)
at
fr.pcoinnovation.websemantic.gb.servlets.InitGraphValues.initGB(InitGraphValues.java:232)


2013/6/5 Rob Vesse <[email protected]>

> Hi Brice
>
> Writing SPARQL/XML should be entirely streaming and in principal has
> minimal memory overhead.
>
> The most likely culprit of a OOM is in the query execution itself since
> depending on the query the system may need to hold a lot of intermediate
> data in memory.  You haven't said anything about what your query is or
> what you data is e.g. how large, whether it is in-memory or in some
> persistent triple store.  Also I am guessing from your variable names you
> are querying an OntModel/InfModel which may itself cause lots of memory
> usage during query execution if rules have to be evaluated.
>
> The ResultSet object you get when you call execSelect() is essentially a
> thin wrapper over a QueryIterator which is itself essentially a wrapper
> over the query plan for answering the query.  Until you start iterating
> over the ResultSet (and thus the underlying QueryIterator) the query
> engine does not actually do any work, when you call
> ResultSetFormatter.outputAsXML() it starts iterating and thus evaluating
> the query which is why you get the OOM at this line (you didn't say so
> directly in your email but I will assume this from your assertion that
> outputting the XML is the problem).
>
> If you need further help with this it is much easier for us to help you if
> you provide a complete minimal example I.e. one that includes code, query
> and data plus the error message w/ stack trace.
>
> Rob
>
>
>
> On 6/5/13 9:13 AM, "Brice Sommacal" <[email protected]> wrote:
>
> >Hello everyone,
> >
> >I'm facing a "java.lang.OutOfMemoryError: GC overhead limit exceeded"
> >error
> >and I would like an advice about how I could optimize my code.
> >
> >The aim of this method is to run a SPARQL query, convert it on a XML
> >format
> >and then apply a XSL stylesheet[1] to write a JSON format (readable by
> >Exhibit - Scripted [2]).
> >
> > My piece of code was working well untill today. (I have been trying to
> >query a big model and the query returns too much results).
> >This makes my program break.
> >
> ><quote>
> >Query queryToExec = QueryFactory.create(query, Syntax.syntaxARQ);
> >QueryExecution qexec = QueryExecutionFactory.create(queryToExec, ontoIn);
> > ResultSet result = null;
> >BufferedOutputStream buf;
> >try{
> > result = qexec.execSelect();
> >buf = new BufferedOutputStream(new FileOutputStream(new File(root +
> >"XML/JSON_XML/"+qNameClass+".xml")));
> > //Serialization of the resultSet
> >ResultSetFormatter.outputAsXML(buf, result);
> >buf.close();
> > }
> >catch (Exception e) {
> >e.printStackTrace();
> > }
> >finally{
> >qexec.close();
> >}
> ></quote>
> >
> >I know that writing XML file use loads memory....
> >
> >I was thinking of:
> > - creating several XML files by tracing the ResullSetFormatter memory
> >usage. (is there possible?)
> > - avoiding XML intermediate format and write directly in one or several
> >JSON file...
> > - ...
> >
> >
> >>  Is there someone whom find a way to avoid this kind of error (without
> >increasing Xms Xmx) ??
> >
> >Thanks in advance,
> >
> >
> >Brice
> >
> >[1] http://data-gov.tw.rpi.edu/wiki/Sparqlxml2exhibitjson.xsl
> >[2] http://www.simile-widgets.org/exhibit3/
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"SIMILE Widgets" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/simile-widgets?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

[Simile-Widgets] Exhibit 3/ RDF Store - read data source [WAS: GC limit using ResultSet.outputAsXML - way to optimize my code?]

Reply via email to