Fredah message was sent to a number of RDF system lists. The text isn't
directed at Jena specifically, it is the same text to all lists.
Does using a dictionary count as compression? TDB (and SDB) use a term
dictionary (= node table) unlike, say the old RDB system that stored
terms inline in the triple table.
Compression in TDB is interesting for scale but it is a tradeoff. By
using memory mapped files, the on disk form and accessed form in Java
are the same. So the "decompress on read" style does not apply; memory
mapped files avoid that copy/decompress and only necessary bytes are
touched. And the OS does the work of caching and it's quite well tuned
for that.
Andy
On 10/05/15 17:32, Claude Warren wrote:
Fredah,
As far as I know the only compression in the system is in the interaction
with remote systems where the compression flag can be enabled to compress
HTTP/S responses from Fuseki and from federated queries.
I suppose some storage engines could implement compression but that would
be on an engine by engine basis.
How the data are stored are also determined on an engine by engine basis.
From what I can tell most implementations use TDB (a native storeage
engine), Andy Seaborne would be able to speak to how that stores data but
it is a native format with several indexes. Another possible storage
engine is SDB, but that is mostly retired. It uses a relational database
to store the data in several tables with several indexes. There is an in
memory engine, and I have implemented a bloom filter based engine built on
top of a relational storage model. I suspect there are other storage
engines available but I don't know what they are or how they are
implemented.
Claude
On Thu, May 7, 2015 at 1:29 PM, Fredah B <[email protected]> wrote:
Dear Team,
I plan on using your SPARQL engine for my project implementation. I’m
impressed by the tremendous work you have put in to make this engine a
success however I did notice that the underlying infrastructure and
compression technique used are encapsulated. I need to fully understand how
the data is processed from start to finish especially with regards to the
compression. Are there by any chance papers that have been written that
cover the compression and decompression used in your engine or is it
possible to refer me to someone who may be able to explain it to me?
Also, is compression default or is turned on and off depending on the data
load of the system? I was also wondering how you store the data internally.
As in, what format is the data stored? Is it an internally created
representation or one of the standard RDF representations?
I would really appreciate your assistance in answering these questions and
look forward to hearing from you soon.
Best Regards,
Fredah