dcausse added a comment.
We should export the triples from a production journal to try to understand
where are the differences. To do this we need to copy a journal and run some
tools provided by blazegraph.
The tool is ExportKB to run it we need all the jars present in the war (the
jar for the Updater is not sufficient).
- extract the war somewhere:
mkdir /tmp/blazegraph-war && cd /tmp/blazegraph-war && jar xvf
/srv/deployment/wdqs/wdqs/blazegraph-service-*-SNAPSHOT.war
Then move to the folder containing the `wikidata.jnl` file and run:
java
-Dlogback.configurationFile=/tmp/blazegraph-war/WEB-INF/classes/logback.xml -cp
'/tmp/blazegraph-war/WEB-INF/lib/*' -server com.bigdata.rdf.sail.ExportKB
-outdir journal_export/ -format Turtle
/srv/deployment/wdqs/wdqs/RWStore.properties wdq
We don't have to run this on a production machine, we just need the wdqs war
and the `RWStore.properties`, the required space will probably be somewhere
between 500Gb to 1Tb, ideally we'd like them to be in HDFS in the analytics
network so we could use more compute to run basic aggregations to detect where
are the differences.
TASK DETAIL
https://phabricator.wikimedia.org/T239898
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dcausse
Cc: dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86,
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst,
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll,
Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs