hoo added subscribers: mkroetzsch, daniel. hoo added a comment.
I've just looked into this and I think the problem is that the suggester is using a very naive way to select its suggestions. It basically queries for probable matches by each property id that is used on an item individually, which highly prefers properties that have a few high probable matches. Put more mathematically (partly taken from the thesis about this): `Q` is the item we want suggestions for and `Properties(Q)` is the set of properties used in Statements on it. For each pair `P1 ∈ Properties(Q)`, `P2 ∉ Properties(Q)` we look at the confidence that `P1 => P2` (without taking any further context into account). We also look for the confidence `(P31, Q) => P2` (where `P31` is instance of). Later on a list of all these `P2`s (ordered by confidence) is returned (the ones found with the `(P31, Q)` pair and the ones found by just looking by a given `P1` are treated equally). We probably want to move away from selecting these `P2`s individually by `P1` and try to get correlations for all Properties at the same time (`Properties(Q) => P2`) or by combining the individual probability of each `P1` with the one from the `(P31, Q)` pair (`{P1, (P31, Q)} => P2`. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 _______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs