Hi!

> Good idea : I’m trying to load the JSON into an Apache Jena TDB triplestore 
> for later processing and querying. I have a couple of JSON dumps locally. I 
> chose the JSON format because of the size difference (the BZ2 JSON is 8GB and 
> the BZ2 TTL beta is 12GB). What would be the best way to do this ? I’d rather 
> use the Java Wikidata Toolkit instead of the PHP stack if possible.

I do not think JSON export is particularly well-suited for import into a
triple store. It implements the original data model, which needs
significant re-mapping to fit triple store model (e.g. see
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Data_model
)

You can use one of the following approaches:
1. Use existing RDF dump (yes, it is bigger, because triple store
representation is more verbose by nature)

2. Try to convert manually - e.g. with Java Wikidata Toolkit, using
something like
https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/examples/RdfSerializationExample.java

Note that RDF dump contains a little bit more data than JSON one - IIRC,
page properties
(https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Page_properties)
are not there, and there might be other small differences.
-- 
Stas Malyshev
[email protected]

_______________________________________________
Wikidata-tech mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Reply via email to