Hello,
I've had some trouble loading datasets to Fuseki through PUT requests,
so I decided to investigate whether it leaks memory. It seems to do so
both in Fuseki 0.2.4 and in the current 0.2.5-SNAPSHOT. This may be
related to JENA-101 [1] which identified a similar memory leak, but that
particular issue was (claimed to be) fixed in August 2011.
My test is this:
0. create empty directory for TDB: mkdir tdb
1. Start up Fuseki with ./fuseki-server --update --loc=tdb /ds
2. Repeatedly PUT the test dataset (STW thesaurus [2] in RDF/XML format,
16MB file, about 1.2M triples) using this command:
while true ; do ./s-put http://localhost:3030/ds/data default
../stw.rdf ; sleep 20 ; done
The "sleep 20" is there to let Java garbage collection do its work
between requests. Fuseki CPU usage falls to 0% during this time.
The JVM is run with the -Xmx1200M option which seems to be the default
in the Fuseki startup scripts. I'm using a Ubuntu 12.04 amd64 machine
with 8GB memory. Java is OpenJDK 6, java -version says "1.6.0_24". I ran
the tests a few times just in case.
Results:
A. jena-fuseki-0.2.4 release
Fuseki 0.2.4 can handle 10 PUTs, resident memory usage (RSS) grows by
about 50-100MB per request and plateaus at 1.5GB according to top. On
the 11th request, I get a "java.lang.OutOfMemoryError: GC overhead limit
exceeded" error.
B. jena-fuseki-0.2.5-SNAPSHOT of 2012-09-27 downloaded from the snapshot
directory [3]
Fuseki 0.2.5-SNAPSHOT can handle 10 PUTs, resident memory usage (RSS)
grows by about 50-100MB per request and plateaus at 1.5GB according to
top. On the 11th request, I get either a "java.lang.OutOfMemoryError:
Java heap space" or a "java.lang.OutOfMemoryError: GC overhead limit
exceeded" error.
Thoughts:
I know I should probably use tdbloader to load datasets instead of PUT
requests, but I think this memory leaking is quite excessive. This is
not a very large dataset, but I can only upload it ten times before
Fuseki runs out of memory. Of course I could increase the heap memory
given to the JVM to 2-3 GB as recommended by several sources, but that
would probably just push the limit up by 2-3x.
Is this a bug or expected behavior?
Thanks,
Osma
[1] https://issues.apache.org/jira/browse/JENA-101
[2] http://zbw.eu/stw/versions/latest/download/about.en.html
[3]
https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-fuseki/0.2.5-SNAPSHOT/
--
Osma Suominen | [email protected] | +358 40 5255 882
Aalto University, Department of Media Technology, Semantic Computing
Research Group
Room 2541, Otaniementie 17, Espoo, Finland; P.O. Box 15500, FI-00076
Aalto, Finland