Smalyshev added a comment.

Right now I think this should be the plan:

The tool gets two .json config files - new config and old config. Old config can be optional. Then:

  1. Diff the configs and produce list of new units
  2. For each new primary unit:
    1. Run SPARQL query to find all values using it and generate self-referencing normalized statements with wikibase:quantityNormalized
    2. Run SPARQL query to find all statements using those values (need to see if we have too many we may have to split it in batches) and generate parallel normalized statements for those, with the same value.
  3. For each new non-primary unit:
    1. Run SPARQL query to find all values using it and generate new converted value for each one. Generate SPARQL for those new values and also wikibase:quantityNormalized statements on the old values.
    2. Run SPARQL query to find all statements using those values and generate parallel normalized statements for those, with the new converted value.

The output of the tool will be RDF/TTL that can be bulk-loaded into the instance.

We need to see if we will be able to hold all the values described in memory. So far the most popular unit - square kilometre - has 13398 usages, it should not be a problem to hold all of them in memory I think.


TASK DETAIL
https://phabricator.wikimedia.org/T145426

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: aude, Aklapper, Smalyshev, gerritbot, TomT0m, Lydia_Pintscher, daniel, mschwarzer, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to