That's what I get from the metadata/header bin/hdtInfo.sh ~/wikidata.hdt <file://wikidata.ttl> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/HDT/hdt#Dataset> . <file://wikidata.ttl> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.org/ns/void#Dataset> . <file://wikidata.ttl> <http://rdfs.org/ns/void#triples> "4579973187" . <file://wikidata.ttl> <http://rdfs.org/ns/void#properties> "17301" . <file://wikidata.ttl> <http://rdfs.org/ns/void#distinctSubjects> "481902070" . <file://wikidata.ttl> <http://rdfs.org/ns/void#distinctObjects> "715508797" . <file://wikidata.ttl> <http://purl.org/HDT/hdt#statisticalInformation> _:statistics . <file://wikidata.ttl> <http://purl.org/HDT/hdt#publicationInformation> _:publicationInformation . <file://wikidata.ttl> <http://purl.org/HDT/hdt#formatInformation> _:format . _:format <http://purl.org/HDT/hdt#dictionary> _:dictionary . _:format <http://purl.org/HDT/hdt#triples> _:triples . _:dictionary <http://purl.org/dc/terms/format> <http://purl.org/HDT/hdt#dictionaryFour> . _:dictionary <http://purl.org/HDT/hdt#dictionarynumSharedSubjectObject> "381953626" . _:dictionary <http://purl.org/HDT/hdt#dictionarymapping> "1" . _:dictionary <http://purl.org/HDT/hdt#dictionarysizeStrings> "22827063388" . _:dictionary <http://purl.org/HDT/hdt#dictionaryblockSize> "16" . _:triples <http://purl.org/dc/terms/format> <http://purl.org/HDT/hdt#triplesBitmap> . _:triples <http://purl.org/HDT/hdt#triplesnumTriples> "4579973187" . _:triples <http://purl.org/HDT/hdt#triplesOrder> "SPO" . _:statistics <http://purl.org/HDT/hdt#originalSize> "198373280855" . _:statistics <http://purl.org/HDT/hdt#hdtSize> "47873693833" . _:publicationInformation <http://purl.org/dc/terms/issued> "2017-11-03T21:24:29+01:00" .
In particular: _:triples <http://purl.org/HDT/hdt#triplesOrder> "SPO" . Moreover, the beginning of the wikidata.hdt.index file contains: $HDT^E<http://purl.org/HDT/hdt#indexFoQ>^@numTriples=4579973187;order=1; I don't know how/where the .index files is taken into account. According to docs, the first time a search is triggered. But let's stop here - off list. Continue on the HDT list/forum if necessary. Lorenz On 18.12.2017 11:03, Dick Murray wrote: > On 18 December 2017 at 08:07, Laura Morales <laure...@mail.com> wrote: > >>> The don't have index permutations spo, ops, pos, etc. >> Yes they have, what you're saying is wrong. See http://www.rdfhdt.org/hdt- >> binary-format/#triples That's what the .hdt.index file is about, to store >> more index permutations. >> > This is going off Jena list but do we know how the wiki HDT was compiled > because having read the technical stuff including the link above the the > $$streamsOrder property (which defaults to SPO) sets the triple index > order. Can you query the HDT header and see what this is set to? 0 = SPO, >> =1 SOP, etc. Also check $$IDCodificationBits because Wiki blew the > original HDT code as it exceeded 2^32 triples and there was a new 64 id > code base in dev. Plus how big is the generated .hdt.index file (it's in > the same folder as the .hdt file), this file is autogen as soon as you try > and search the HDT. > > As previously mentioned this is best off this list, so dick-twocows on > github. > > >> >>> To bring this thread to an end, I guess we finally answered your >>> question? Or are the any open issues? >> I think the only remaining open questions are: >> >> - since the problem was not with the OFFSET, would the query "SELECT ?s >> FROM <wikidata> WHERE ..." also fail to terminate with a TDB-backed >> namedGraph (instead of HDT)? >> >> - is there any improvement that can be added to Jena to solve these type >> of queries faster, or is it just the way it is and nothing can be done >> about it? >>