Since I don't think anyone answered your specific original question TDB and TDB2 both use dictionary encoding (and in fact most RDF stores use some variation on this). Basically they map each unique RDF term (whether URI, string, blank node etc) to a consistent internal identifier and use this to refer to the term. Therefore most data structures internally are implemented in terms of these internal identifiers (which are typically very compact, TDB/TDB2 use 64 bit identifiers) and the system only translates between the internal identifier and the full RDF term when explicitly needed e.g. when presenting results
Rob On 15/02/2019, 06:03, "Ekaterina Danilova" <[email protected]> wrote: i would like to ask how TDB2 and Fuseki manages big amounts of string data (especially repeating data) and what it the best practices. Does it optimize it somehow?
