dcausse added a comment.

  We should export the triples from a production journal to try to understand 
where are the differences. To do this we need to copy a journal and run some 
tools provided by blazegraph.
  
  The tool is ExportKB to run it we need all the jars present in the war (the 
jar for the Updater is not sufficient).
  
  - extract the war somewhere:
  
    mkdir /tmp/blazegraph-war && cd /tmp/blazegraph-war && jar xvf 
/srv/deployment/wdqs/wdqs/blazegraph-service-*-SNAPSHOT.war
  
  Then move to the folder containing the `wikidata.jnl` file and run:
  
    java 
-Dlogback.configurationFile=/tmp/blazegraph-war/WEB-INF/classes/logback.xml -cp 
'/tmp/blazegraph-war/WEB-INF/lib/*' -server com.bigdata.rdf.sail.ExportKB 
-outdir journal_export/ -format Turtle 
/srv/deployment/wdqs/wdqs/RWStore.properties wdq
  
  We don't have to run this on a production machine, we just need the wdqs war 
and the `RWStore.properties`, the required space will probably be somewhere 
between 500Gb to 1Tb, ideally we'd like them to be in HDFS in the analytics 
network so we could use more compute to run basic aggregations to detect where 
are the differences.

TASK DETAIL
  https://phabricator.wikimedia.org/T239898

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to