EBernhardson added a comment.
The SPARQL query endpoint that provides the categories to search against
doesn't appear to be returning all expected sub-categories.:
ebernhardson@mwmaint1002:~$ curl -s -XPOST
http://wdqs-internal.discovery.wmnet/bigdata/namespace/categories/sparql?format=json
-d 'query=SELECT ?out WHERE {
SERVICE mediawiki:categoryTree {
bd:serviceParam mediawiki:start
<https://en.wikipedia.org/wiki/Category:Musicals_by_topic> .
bd:serviceParam mediawiki:direction "Reverse" .
bd:serviceParam mediawiki:depth 5 .
}
} ORDER BY ASC(?depth)
LIMIT 50' | jq '.results.bindings | map(.out.value)'
[
"https://en.wikipedia.org/wiki/Category:Musicals_by_topic",
"https://en.wikipedia.org/wiki/Category:Musicals_about_writers",
"https://en.wikipedia.org/wiki/Category:Musicals_about_World_War_II",
"https://en.wikipedia.org/wiki/Category:Musicals_set_in_the_Roaring_Twenties",
"https://en.wikipedia.org/wiki/Category:Plays_and_musicals_about_disability",
"https://en.wikipedia.org/wiki/Category:Musicals_about_World_War_I",
"https://en.wikipedia.org/wiki/Category:Musicals_about_the_Great_Depression"
]
In particular this is missing:
- Category:LGBT-related musicals‎
- Category:Teen musicals
Checked the latest dump (which should be loaded into SPARQL):
https://dumps.wikimedia.org/other/categoriesrdf/20191116/enwiki-20191116-categories.ttl.gz
The RDF includes the statements:
<https://en.wikipedia.org/wiki/Category:Teen_musicals>
mediawiki:isInCategory
<https://en.wikipedia.org/wiki/Category:Musicals_by_topic>,
<https://en.wikipedia.org/wiki/Category:Teens_in_fiction> .
<https://en.wikipedia.org/wiki/Category:LGBT-related_musicals>
mediawiki:isInCategory
<https://en.wikipedia.org/wiki/Category:LGBT_portrayals_in_media>,
<https://en.wikipedia.org/wiki/Category:LGBT_theatre>,
<https://en.wikipedia.org/wiki/Category:Musicals_by_topic> .
Oddly if we ask blazegraph about one of these categories it doesn't seem to
know anything:
ebernhardson@mwmaint1002:~$ curl -s -XPOST
http://wdqs-internal.discovery.wmnet/bigdata/namespace/categories/sparql?format=json
-d 'query=SELECT ?out WHERE {
> <https://en.wikipedia.org/wiki/Category:Teen_musicals>
mediawiki:isInCategory ?out
> } LIMIT 50'
{
"head" : {
"vars" : [ "out" ]
},
"results" : {
"bindings" : [ ]
}
}
While asking about a different category in same way works fine:
ebernhardson@mwmaint1002:~$ curl -s -XPOST
http://wdqs-internal.discovery.wmnet/bigdata/namespace/categories/sparql?format=json
-d 'query=SELECT ?out WHERE {
<https://en.wikipedia.org/wiki/Category:Musicals_about_writers>
mediawiki:isInCategory ?out
} LIMIT 50' | jq '.results.bindings | map(.out.value)'
[
"https://en.wikipedia.org/wiki/Category:Works_about_writers",
"https://en.wikipedia.org/wiki/Category:Musicals_by_topic"
]
Summary: It seems like the dumps aren't being imported into blazegraph
properly, perhaps some of the triples are erroring out or some such?
TASK DETAIL
https://phabricator.wikimedia.org/T238686
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: EBernhardson
Cc: EBernhardson, halfeatenscone, Aklapper, darthmon_wmde, DannyS712, Nandana,
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst,
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll,
Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs