Ladsgroup added subscribers: Lucas_Werkmeister_WMDE, Lydia_Pintscher, Ladsgroup.
Ladsgroup added a comment.


  So I looked at this. It's a bigger problem in general and it's due to the way 
we handle "not matching". It's slightly complex, so bear with me.
  
  so the query that query builder produces is this: https://w.wiki/unm (with 
some modifications):
  
    SELECT ?item ?itemLabel ?instance ?instanceLabel WHERE {
      SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE]". }
      ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255.
      ?item (p:P31/ps:P31/(wdt:P279*)) ?instance.
      FILTER(?instance != wd:Q5)
    }
    LIMIT 5
  
  What is happening? it's basically going up the ladder of `P279` for not 
matching, so `P279` of Q5 and its `P279` and so on, so each item becomes 
several rows (as wdqs is a graph db of a triples not items).
  So the result would be:
  
  | wd:Q7251 | Alan Turing | wd:Q103940464  | continuant                    |
  | wd:Q7251 | Alan Turing | wd:Q99527517  | collection entity             |
  | wd:Q7251 | Alan Turing | wd:Q53617489  | independent continuant        |
  | wd:Q7251 | Alan Turing | wd:Q28813620  | set                           |
  | wd:Q7251 | Alan Turing | wd:Q27043950  | anatomical entity             |
  | wd:Q7251 | Alan Turing | wd:Q16887380  | group                         |
  | wd:Q7251 | Alan Turing | wd:Q26720107  | subject of a right            |
  | wd:Q7251 | Alan Turing | wd:Q35120  | entity                        |
  | wd:Q7251 | Alan Turing | wd:Q23958946  | individual entity             |
  | wd:Q7251 | Alan Turing | wd:Q159344  | heterotroph                   |
  | wd:Q7251 | Alan Turing | wd:Q7239  | organism                      |
  | wd:Q7251 | Alan Turing | wd:Q24229398  | agent                         |
  | wd:Q7251 | Alan Turing | wd:Q18336849  | item with given name property |
  | wd:Q7251 | Alan Turing | wd:Q830077  | subject                       |
  | wd:Q7251 | Alan Turing | wd:Q795052  | individual                    |
  | wd:Q7251 | Alan Turing | wd:Q45983014  | organisms by adaptation       |
  | wd:Q7251 | Alan Turing | wd:Q72638  | consumer                      |
  | wd:Q7251 | Alan Turing | wd:Q3778211  | legal person                  |
  | wd:Q7251 | Alan Turing | wd:Q215627  | person                        |
  | wd:Q7251 | Alan Turing | wd:Q164509  | omnivore                      |
  | wd:Q7251 | Alan Turing | wd:Q154954  | natural person                |
  | wd:Q7251 | Alan Turing | wd:Q5 | human                         |
  |
  
  And it only removes the last line (and leaves the rest) making the query both 
incorrect and full of duplicates.
  
  I talked to @Lucas_Werkmeister_WMDE and came up with several solutions but 
each has pros and cons.
  
  One:
  https://w.wiki/unp
  
    SELECT DISTINCT ?item ?itemLabel ?instance ?instanceLabel WHERE {
      SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE]". }
      ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255.
      ?item (p:P31/ps:P31) ?class.
      MINUS { ?item (p:P31/ps:P31/(wdt:P279)*) wd:Q5. }
    }
    LIMIT 5
  
  Basically take every one who has P31 <https://phabricator.wikimedia.org/P31>, 
and remove anything that has the Q5 in the P279 
<https://phabricator.wikimedia.org/P279> ladder
  
  Pros:
  
  - Correct
  
  Con:
  
  - It times out
  
  Two: https://w.wiki/unu
  The other way to handle it is to actually discard P279 
<https://phabricator.wikimedia.org/P279> ladder for "not matching" part.
  
    SELECT DISTINCT ?item ?itemLabel ?instance ?instanceLabel WHERE {
      SERVICE wikibase:label { bd:serviceParam wikibase:language 
"[AUTO_LANGUAGE]". }
      ?item (p:P106/ps:P106/(wdt:P279*)) wd:Q2526255.
      MINUS {?item p:P31/ps:P31 wd:Q5. }
    }
    LIMIT 5
  
  Pros:
  
  - It's fast
  
  Cons:
  
  - It's limited, If I want to filter out galaxies from my result, it wouldn't 
exclude spiral galaxies, etc.
  
  I don't know which way to go. I think @Lydia_Pintscher should decide here.

TASK DETAIL
  https://phabricator.wikimedia.org/T272140

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Ladsgroup, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Aklapper, amy_rc, 
Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to