Hello,

I've had some trouble loading datasets to Fuseki through PUT requests, so I decided to investigate whether it leaks memory. It seems to do so both in Fuseki 0.2.4 and in the current 0.2.5-SNAPSHOT. This may be related to JENA-101 [1] which identified a similar memory leak, but that particular issue was (claimed to be) fixed in August 2011.

My test is this:

0. create empty directory for TDB: mkdir tdb

1. Start up Fuseki with ./fuseki-server --update --loc=tdb /ds

2. Repeatedly PUT the test dataset (STW thesaurus [2] in RDF/XML format, 16MB file, about 1.2M triples) using this command:

while true ; do ./s-put http://localhost:3030/ds/data default ../stw.rdf ; sleep 20 ; done

The "sleep 20" is there to let Java garbage collection do its work between requests. Fuseki CPU usage falls to 0% during this time.

The JVM is run with the -Xmx1200M option which seems to be the default in the Fuseki startup scripts. I'm using a Ubuntu 12.04 amd64 machine with 8GB memory. Java is OpenJDK 6, java -version says "1.6.0_24". I ran the tests a few times just in case.

Results:

A. jena-fuseki-0.2.4 release

Fuseki 0.2.4 can handle 10 PUTs, resident memory usage (RSS) grows by about 50-100MB per request and plateaus at 1.5GB according to top. On the 11th request, I get a "java.lang.OutOfMemoryError: GC overhead limit exceeded" error.


B. jena-fuseki-0.2.5-SNAPSHOT of 2012-09-27 downloaded from the snapshot directory [3]

Fuseki 0.2.5-SNAPSHOT can handle 10 PUTs, resident memory usage (RSS) grows by about 50-100MB per request and plateaus at 1.5GB according to top. On the 11th request, I get either a "java.lang.OutOfMemoryError: Java heap space" or a "java.lang.OutOfMemoryError: GC overhead limit exceeded" error.


Thoughts:

I know I should probably use tdbloader to load datasets instead of PUT requests, but I think this memory leaking is quite excessive. This is not a very large dataset, but I can only upload it ten times before Fuseki runs out of memory. Of course I could increase the heap memory given to the JVM to 2-3 GB as recommended by several sources, but that would probably just push the limit up by 2-3x.

Is this a bug or expected behavior?

Thanks,
Osma


[1] https://issues.apache.org/jira/browse/JENA-101

[2] http://zbw.eu/stw/versions/latest/download/about.en.html

[3] https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-fuseki/0.2.5-SNAPSHOT/

--
Osma Suominen | [email protected] | +358 40 5255 882
Aalto University, Department of Media Technology, Semantic Computing Research Group Room 2541, Otaniementie 17, Espoo, Finland; P.O. Box 15500, FI-00076 Aalto, Finland

Reply via email to