EBernhardson added a comment.

  The SPARQL query endpoint that provides the categories to search against 
doesn't appear to be returning all expected sub-categories.:
  
    ebernhardson@mwmaint1002:~$ curl -s -XPOST 
http://wdqs-internal.discovery.wmnet/bigdata/namespace/categories/sparql?format=json
 -d 'query=SELECT ?out WHERE {
          SERVICE mediawiki:categoryTree {
              bd:serviceParam mediawiki:start 
<https://en.wikipedia.org/wiki/Category:Musicals_by_topic> .
              bd:serviceParam mediawiki:direction "Reverse" .
              bd:serviceParam mediawiki:depth 5 .
          }
    } ORDER BY ASC(?depth)
    LIMIT 50' | jq '.results.bindings | map(.out.value)'
    [
      "https://en.wikipedia.org/wiki/Category:Musicals_by_topic";,
      "https://en.wikipedia.org/wiki/Category:Musicals_about_writers";,
      "https://en.wikipedia.org/wiki/Category:Musicals_about_World_War_II";,
      
"https://en.wikipedia.org/wiki/Category:Musicals_set_in_the_Roaring_Twenties";,
      
"https://en.wikipedia.org/wiki/Category:Plays_and_musicals_about_disability";,
      "https://en.wikipedia.org/wiki/Category:Musicals_about_World_War_I";,
      
"https://en.wikipedia.org/wiki/Category:Musicals_about_the_Great_Depression";
    ]
  
  In particular this is missing:
  
  - Category:LGBT-related musicals‎
  - Category:Teen musicals
  
  Checked the latest dump (which should be loaded into SPARQL): 
https://dumps.wikimedia.org/other/categoriesrdf/20191116/enwiki-20191116-categories.ttl.gz
  
  The RDF includes the statements:
  
    <https://en.wikipedia.org/wiki/Category:Teen_musicals> 
mediawiki:isInCategory 
<https://en.wikipedia.org/wiki/Category:Musicals_by_topic>,
            <https://en.wikipedia.org/wiki/Category:Teens_in_fiction> .
  
    <https://en.wikipedia.org/wiki/Category:LGBT-related_musicals> 
mediawiki:isInCategory 
<https://en.wikipedia.org/wiki/Category:LGBT_portrayals_in_media>,
            <https://en.wikipedia.org/wiki/Category:LGBT_theatre>,
            <https://en.wikipedia.org/wiki/Category:Musicals_by_topic> .
  
  Oddly if we ask blazegraph about one of these categories it doesn't seem to 
know anything:
  
    ebernhardson@mwmaint1002:~$ curl -s -XPOST 
http://wdqs-internal.discovery.wmnet/bigdata/namespace/categories/sparql?format=json
 -d 'query=SELECT ?out WHERE {
    >     <https://en.wikipedia.org/wiki/Category:Teen_musicals> 
mediawiki:isInCategory ?out
    > } LIMIT 50'
    {
      "head" : {
        "vars" : [ "out" ]
      },
      "results" : {
        "bindings" : [ ]
      }
    }
  
  While asking about a different category in same way works fine:
  
    ebernhardson@mwmaint1002:~$ curl -s -XPOST 
http://wdqs-internal.discovery.wmnet/bigdata/namespace/categories/sparql?format=json
 -d 'query=SELECT ?out WHERE {
        <https://en.wikipedia.org/wiki/Category:Musicals_about_writers> 
mediawiki:isInCategory ?out
    } LIMIT 50' | jq '.results.bindings | map(.out.value)'
    [
      "https://en.wikipedia.org/wiki/Category:Works_about_writers";,
      "https://en.wikipedia.org/wiki/Category:Musicals_by_topic";
    ]
  
  Summary: It seems like the dumps aren't being imported into blazegraph 
properly, perhaps some of the triples are erroring out or some such?

TASK DETAIL
  https://phabricator.wikimedia.org/T238686

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson
Cc: EBernhardson, halfeatenscone, Aklapper, darthmon_wmde, DannyS712, Nandana, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to