Hi! > Good idea : I’m trying to load the JSON into an Apache Jena TDB triplestore > for later processing and querying. I have a couple of JSON dumps locally. I > chose the JSON format because of the size difference (the BZ2 JSON is 8GB and > the BZ2 TTL beta is 12GB). What would be the best way to do this ? I’d rather > use the Java Wikidata Toolkit instead of the PHP stack if possible.
I do not think JSON export is particularly well-suited for import into a triple store. It implements the original data model, which needs significant re-mapping to fit triple store model (e.g. see https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Data_model ) You can use one of the following approaches: 1. Use existing RDF dump (yes, it is bigger, because triple store representation is more verbose by nature) 2. Try to convert manually - e.g. with Java Wikidata Toolkit, using something like https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/examples/RdfSerializationExample.java Note that RDF dump contains a little bit more data than JSON one - IIRC, page properties (https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Page_properties) are not there, and there might be other small differences. -- Stas Malyshev [email protected] _______________________________________________ Wikidata-tech mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
