[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
hoo added a comment. In T132839#2776630, @Esc3300 wrote: Maybe P106 and P17 could be a "classifying" ones as well. P106 probably could, given it's only used on humans. I also thought about this myself before… please create a separate ticket for this. P17 is used on so many different subjects (buildings, cities, streets, …), that boosting it might result in weird correlations.TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: Esc3300, AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
hoo added a comment. I poked at this a bit on Thursday and Friday and came up with a new idea which will (hopefully) significantly improve the suggestions given. Currently there are two types of correlations that the suggester considers: "Classifying" ones ("instance of" and "subclass of") where we take into account the Property id and the value of Statements. Non-classifying correlations, where only the fact that a Statement with a certain Property id exists on an Item is considered. Right now these two types of correlations are treated equally when suggesting new Properties to use. During playing around with various options, I figured that the suggestions based on the "classifying" correlations are usually way better than the ones which are based purely on the fact that two Properties are often used together. Due to this, we decided to implement a setting which will allows us to adjust the weight given to the correlation types ins question. The pull request for this is at https://github.com/Wikidata-lib/PropertySuggester/pull/179 and the change will need a new PropertySuggester 4.0. Once this has been deployed, we can undo the workarounds for this bug and then see what the right weight for classifying correlations should be. In my tests rather "extreme" values like 0.75 : 0.25 or even 0.8 : 0.2 worked best, so I would suggest trying these for starters. Note: Suggestions for qualifiers and references, and suggestions for Items without instance of/ subclass of wont be affected by this at all.TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2016-11-02T12:11:32Z] Updated Wikidata's property suggester with data from Monday's json dump and applied the T132839 workaroundsTASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2016-10-13T10:31:25Z] Ran (updated) T132839-Workarounds.sh from my home in terbiumTASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
hoo added a comment. I thought about this for a bit and have the following improvement in mind, which is going to work on the data structure we currently have, thus we can do a rather minimal change in the extension code in order to achieve this. The current model being used is described on a very high level in T132839#2270026. My suggestion: For all properties used on an Item, get the probabilities for their use together with other properties. Average this across all used properties then. Additionally we could experiment with weighting the average. This could be done by data type (so that we could for example make external ids weigh in less). We could also try to put less weight on properties that are being used together with a lot of other properties, as that might indicate that the property is not well suited for getting topic specific suggestions.TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2016-10-04T13:40:34Z] Updated Wikidata's property suggester with data from Monday's json dump and applied the T132839 workaroundsTASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
hoo added a comment. Further updated the workaround: It now also remove suggestions based on P18 (image) and P373 (commons category).TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2016-09-14T15:50:56Z] Ran T132839-Workarounds.sh from my home in terbium (see T132839)TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL (#wikimedia-operations) [2016-09-14T15:50:56Z] Ran T132839-Workarounds.sh from my home in terbium (see T132839)TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL [2016-09-07T01:07:24Z] Updated Wikidata's property suggester with data from Monday's json dump and applied the T132839 workaroundsTASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL [2016-08-28T16:51:23Z] Ran T132839-Workarounds.sh from my home in terbium (see T132839)TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL [2016-08-28T16:51:23Z] Ran T132839-Workarounds.sh from my home in terbium (see T132839)TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
hoo added a comment. Updated the workaround further, per @Sjoerddebruin: hoo@terbium:~$ bash T132839-Workarounds.sh Removing ext ids in item context Batch 1: 0 rows Removing P641 in item context Batch 1: 0 rows Removing P1344 in item context Batch 1: 66 rows Batch 2: 0 rows Removing P463 in item context Batch 1: 110 rows Batch 2: 0 rowsTASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL [2016-08-24T21:08:26Z] Ran DELETE FROM wbs_propertypairs WHERE pid1 = '641' on Wikidata for T132839TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL [2016-08-16T12:53:43Z] Put a better workaround for T132839 in place: Only remove property pairs with context = "item". This keeps ref and qualifier pairs for ext ids intact.TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL [2016-08-08T21:55:14Z] Updated Wikidata's property suggester with data from today's json dump and removed the external identifiers as a workaround for T132839TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL [2016-07-07T08:33:55Z] Updated Wikidata's property suggester with data from Monday's json dump and removed the external identifiers as a workaround for T132839TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL [2016-05-28T19:47:45Z] Updated Wikidata's property suggester with data from Monday's json dump and removed the external identifiers as a workaround for https://phabricator.wikimedia.org/T132839 TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Stashbot Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
thiemowmde added a comment. Thanks for the clarification, this is indeed an important difference. Properties like ISBN are not classifying by value but by the pure fact that they exist. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: thiemowmde Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
hoo added a comment. In https://phabricator.wikimedia.org/T132839#2322937, @thiemowmde wrote: > […] > I suggest to: > > 1. Add unspecific identifiers that apply to all kinds of items to the `$wgPropertySuggesterClassifyingPropertyIds` setting. Why? I don't see how an identifier **value** could ever be classifying (`$wgPropertySuggesterClassifyingPropertyIds` is about classifying based on values not just the properties used, although that might not be correctly implemented right now for strings/ external ids). TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
thiemowmde added a comment. I believe this is wrong. There are "external-identifier" properties that really should be suggested the moment it becomes clear what kind of item you are editing. For example, something with "instanceof book" should get an ISBN number a.s.a.p., and the other way around. Or: an item that happens to be an Rijksmonument in the Netherlands must get an Rijksmonument ID. I suggest to: 1. Add unspecific identifiers that apply to all kinds of items to the `$wgPropertySuggesterClassifyingPropertyIds` setting. 2. Add stuff to `$wgPropertySuggesterDeprecatedIds` that should never be suggested, except you search for it explicitly. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: thiemowmde Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
daniel added a comment. It does not look like PropertySuggester-Python is currently applying any filtering based on data type (or value type). If we want to add such filtering, the best place would probably be in `write_row` in `CsvWriter.php`. We could also filter while reading the input file, but we would have to do this twice, in `JsonReader` and in `XmlReader`. Note however, if we filter out properties with specific data types completely, such properties will never be suggested. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
hoo added a comment. In https://phabricator.wikimedia.org/T132839#2280617, @thiemowmde wrote: > We also came up with a possible improvement: Some properties like "instance of" and "Commons category" are not selective. The fact that this property exists on an item does not say anything. We think it's a good idea to add such properties to a "non-selective" blacklist (or to the existing blacklist). This should reduce noise. We have special handling for instance of and subclass of that avoid this behaviour (these are "classifying properties"). Excluding very generic ones like identifiers and certain string ones also is probably a good idea (or, in the long run, weight them lower?). In https://phabricator.wikimedia.org/T132839#2281371, @thiemowmde wrote: > FYI, I did an other run of code review on https://github.com/Wikidata-lib/PropertySuggester-Python and https://github.com/Wikidata-lib/PropertySuggester and could not find more suspicious code. The Python script should produce massive amounts of warnings when a datatype is missing. Does this happen? Are these logs reviewed after the script is run? I saw it before, but very rarely (like once or twice for a dump run at some point). I'll probably do a new dump run tomorrow and will examine the logs after, but I don't think that's going to give us any new insights. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
thiemowmde added a comment. FYI, I did an other run of code review on https://github.com/Wikidata-lib/PropertySuggester-Python and https://github.com/Wikidata-lib/PropertySuggester and could not find more suspicious code. The Python script should produce massive amounts of warnings when a datatype is missing. Does this happen? Are these logs reviewed after the script is run? TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: thiemowmde Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
thiemowmde added a comment. We also came up with a possible improvement: Some properties like "instance of" and "Commons category" are not selective. The fact that this property exists on an item does not say anything. We think it's a good idea to add such properties to a "non-selective" blacklist (or to the existing blacklist). This should reduce noise. Hm. I just realized that "Commons category" does tell you one thing: That there are pictures and the item should have an "image" property too. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: thiemowmde Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
hoo added a comment. In https://phabricator.wikimedia.org/T132839#2280542, @Tobi_WMDE_SW wrote: > @Lydia_pintscher suggests to look into the code again and find out whether the problem comes from a change to Wikidata that's not reflected in PropertySuggester. We already did that three times by now, I think… not sure where the point in repeating that is. The next part to look for here would be (in my opinion) to get the exact query that the suggester is running and then try to look through the results. In the end I'm fairly sure my analysis at https://phabricator.wikimedia.org/T132839#2270026 is correct. I find it quite unlikely that the quite naive current algorithm can give meaningful results for larger items. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Tobi_WMDE_SW added a comment. @Lydia_pintscher suggests to look into the code again and find out whether the problem comes from a change to Wikidata that's not reflected in PropertySuggester. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Tobi_WMDE_SW Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Nikki added a comment. The suggestions right now seem to be better than before, e.g. for the example in the description I get `P131`, mouth of the watercourse, sex or gender, date of birth. That still includes human properties, but at least mouth of the watercourse actually shows up now. Not quite the same, but presumably caused by how it decides which properties to select: I also often see country-specific properties for large countries show up as suggestions for items in other countries, e.g. https://www.wikidata.org/wiki/Q504582 currently suggests "China administrative division code" despite the item having the country set to the USA. If there's a change that would improve things like that too, that would be awesome. :) TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nikki Cc: daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Lydia_Pintscher added a comment. @hoo removed the external IDs from the correlation table. This seems to improve the situation for now. We'll still need to find a better solution though. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lydia_Pintscher Cc: Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Lydia_Pintscher added a comment. @hoo tried the old correlation data and the suggestions are just as bad. This indicates a problem with the code. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lydia_Pintscher Cc: Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL [2016-05-06T11:10:01Z] Reverted the property suggester data to data from the 20160411 dump (done testing https://phabricator.wikimedia.org/T132839) TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Stashbot Cc: Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Stashbot added a comment. Mentioned in SAL [2016-05-06T11:02:30Z] Overwrote property suggester data with data from the 20160215 dump (https://phabricator.wikimedia.org/T132839) TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Stashbot Cc: Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
hoo added a comment. In https://phabricator.wikimedia.org/T132839#2239348, @thiemowmde wrote: > Ideas we had in todays meeting: > > - Could it be that PropertySuggester ignores all new data types (e.g. identifier)? On first look, https://github.com/wmde/wbs_propertypairs/blob/master/20160314/wbs_propertypairs.csv.gz is 3.42 MB but https://github.com/wmde/wbs_propertypairs/blob/master/20160411/wbs_propertypairs.csv.gz from the next month is 3.57 MB. Should not be the case, I addressed that with https://github.com/Wikidata-lib/PropertySuggester-Python/commit/6fc5610f3e0383d676c0abde20dcee7029274723 (which is applied on the host where I create the suggester data). > - We can undo the last database update to the one from March and see if the problem is still there. If it is, the problem is code. Otherwise it's just the data we have. Sure, we can try this for a bit. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Lydia_Pintscher added a comment. Yeah I think this is quite recent too. I don't think splitting it between identifiers/nonidentifiers will solve the underlying issue here. I take it from Marius' comments that the correlation data didn't change significantly. This leads me to suspect a change in the code being at fault. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lydia_Pintscher Cc: Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Nikki added a comment. Its obsession with human properties does seem to be quite recent and noticeable. If it had been like this all along, I'm not sure why I would suddenly be noticing it so much now. I'm not sure how your suggestion would work. None of the suggestions for the example I gave are identifiers and if you want to add "mouth of the watercourse" (which is also not an identifier), wouldn't you still get the same set of suggestions? Personally I really like that I don't have to use specific "add" buttons to add specific types of statements. I don't pay any attention to which one I click (actually, most of the time I use the KeyShortcuts gadget and press "a", which appears to trigger the one in the identifiers section), they're all just statements to me and if I'm adding a bunch of them, I don't mentally filter them into different groups first. TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nikki Cc: Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
Sjoerddebruin added a comment. In https://phabricator.wikimedia.org/T132839#2213354, @hoo wrote: > Can anyone confirm this is a recent regression? I have the feeling it is, but I don't really know the old suggestions well enough to say for sure. > > A possible solution I could think of: Make the suggester smarter about statement groups (so that it only suggests external ids in the external id section) and then reduce the minimal probability for showing suggestions? There were some overlapping areas, but it seems worser since a few weeks yes. I agree with your solution! TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Sjoerddebruin Cc: Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items
hoo added a comment. Can anyone confirm this is a recent regression? I have the feeling it is, but I don't really know the old suggestions well enough to say for sure. A possible solution I could think of: Make the suggester smarter about statement groups (so that it only suggests external ids in the external id section) and then reduce the minimal probability for showing suggestions? TASK DETAIL https://phabricator.wikimedia.org/T132839 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs