
now that SQID supports the confirmation/rejection of statements from Primary Sources (Freebase imports), I notice certain systematic issues with it. I believe many of the proposals should be removed because they are already represented in Wikidata and do not need to be imported.

Three types of data I found so far:

(1) Redundant "located in the administrative territorial entity"/"contains administrative territorial entity". Wikidata stores only the next territory above/below the current one in these relations. PS often suggests territories reachable through several steps instead.

- https://tools.wmflabs.org/sqid/#/view?id=Q980 (login first to see suggestions). There are almost 100 towns that fall into this area suggested here, but they all should be organised in more specific sub-regions of the hierarchy. - https://tools.wmflabs.org/sqid/#/view?id=Q10474 There is a higher-level territory suggested here (Bavaria) even though "Lower Bavaria" is already present.

Similar things are found, e.g., for occupation (P106), where a person that is already a "sport cyclist" might be suggested to be a "sportsperson".

(2) Syntactic variations of the "same" value. Typical cases are URLs, which PS suggests with trailing "/" even after top-level domains, while Wikidata often omits it. This means you have suggestions like "http://www.pirna.de/"; when there is already "http://www.pirna.de";.


(3) Redirect items as values. PS sometimes suggests statement values that are redirects to other entities, for which there already is a statement.

All of these cases should be fixed on the provider side, not by hiding suggestions in the UI (as it seems to be done by the PS gadget for case (2)). This would also help to get better statistics: right now, all I can do is to reject all of these values, but this might be misleading if one looks at the PS statistics since they are not wrong, but simply unnecessary.

Simply hiding suggestions that are not eliminated from the data also makes the PS service's feature for finding items with suggestions much less useful (you might find items that does not show you any suggestion).

I was wondering if anybody is still working on PS clean up now or if this part of the project this orphaned.



Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Center for Advancing Electronics Dresden (cfaed)
Faculty of Computer Science
TU Dresden
+49 351 463 38486

Wikidata mailing list

Reply via email to