On 20/03/16 17:16, Ignacio Tripodi wrote:
Hello,
I was wondering if you had any minimum hardware suggestions for a
Jena/Fuseki Linux deployment, based on the number of triples used. Is there
a rough guideline for how much RAM should be available in production, as a
function of the size of the imported RDF file (currently less than 2Gb),
number of concurrent requests, etc?
The main use for this will be for wildcarded text searches using the Lucene
full-text index (basically, unfiltered queries using the reverse index). No
SPARQL Update needed. Other resource-intensive operations would be
refreshing the RDF data monthly, followed by rebuilding indices. The test
deployment on my 2012 MacBook runs queries in the order of tens of ms
(unless it's been idle for a while, then the first query is usually in the
order of hundreds of ms for some reason), so I imagine the hardware
requirements can't be that stringent. If it helps, I had to increase my
Java heap size to 3072Mb.
Thanks for any feedback you could provide!
[[
This has been asked on StackOverflow - please copy answers from one
place to the other.
]]
2G in bytes - what is it in triples?
Is this Lucene or Solr?
Is the RDF data held in TDB as the storage? If so, then the part due to
TDB using memory mapped files - these exist in the OS file system cache
not in the java heap. The amount of space it need flexes with use (the
OS does the flexing automatically.
Fir TDB:
TDB write transactions use memory for intermediate space. Read requests
do not normally take space over and above the database caching.
If the data has many large literals, then more heap may be needed
otherwise the space is due to Lucene itself. The jena text subsystem
materializing results so very large results also these may be a factor.
The fact that being idle means the next query is slow is possibly due to
the fact that either the machine is swapping and the in-RAM cached data
got swapped out, or that the file system cache has displaced data and so
it has to go to persistent storage. If you were doing other things on
the machine, it is more likely the latter.
Andy