hoo added subscribers: mkroetzsch, daniel.
hoo added a comment.

  I've just looked into this and I think the problem is that the suggester is 
using a very naive way to select its suggestions. It basically queries for 
probable matches by each property id that is used on an item individually, 
which highly prefers properties that have a few high probable matches.
  
  Put more mathematically (partly taken from the thesis about this):
  
  `Q` is the item we want suggestions for and `Properties(Q)` is the set of 
properties used in Statements on it.
  
  For each pair `P1 ∈ Properties(Q)`, `P2 ∉ Properties(Q)` we look at the 
confidence that `P1 => P2` (without taking any further context into account). 
We also look for the confidence `(P31, Q) => P2` (where `P31` is instance of). 
Later on a list of all these `P2`s (ordered by confidence) is returned (the 
ones found with the `(P31, Q)` pair and the ones found by just looking by a 
given `P1` are treated equally).
  
  We probably want to move away from selecting these `P2`s individually by `P1` 
and try to get correlations for all Properties at the same time (`Properties(Q) 
=> P2`) or by combining the individual probability of each `P1` with the one 
from the `(P31, Q)` pair (`{P1, (P31, Q)} => P2`.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, 
hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, 
Mbch331



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to