Manuel added a comment.
Some thoughts about the notebook:
**Double checking**
Triples should always be distinct, correct? But the number 15 Billion seems
lower than I have read elsewhere.
**Size calculations**
The predicates look correct to me for this analysis.
predicate_representation_dict = {
"label": "<http://www.w3.org/2000/01/rdf-schema#label>",
"description": "<http://schema.org/description>",
"alias": "<http://www.w3.org/2004/02/skos/core#altLabel>"
}
But for the other tasks (e.g. T342111
<https://phabricator.wikimedia.org/T342111>) it will not be as easy as querying
Q-Ids in subjects. Otherwise, we would underestimate the size of the subgraph
in question. I can e.g. see that qualifiers and references follow a different
pattern.
I would suggest that we set up a short meeting with someone from the Wikidata
team who can explain this table to us. In the meeting, you could also briefly
explain the most relevant steps in this notebook so that they could provide a
high-level code review.
TASK DETAIL
https://phabricator.wikimedia.org/T337021
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: AndrewTavis_WMDE, Manuel
Cc: Manuel, Aklapper, Lydia_Pintscher, Astuthiodit_1, AWesterinen,
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana,
Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden,
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer,
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]