[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-11-08 Thread hoo
hoo added a comment.

In T132839#2776630, @Esc3300 wrote:
Maybe P106 and P17 could be a "classifying" ones as well.


P106 probably could, given it's only used on humans. I also thought about this myself before… please create a separate ticket for this.
P17 is used on so many different subjects (buildings, cities, streets, …), that boosting it might result in weird correlations.TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: Esc3300, AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-11-06 Thread hoo
hoo added a comment.
I poked at this a bit on Thursday and Friday and came up with a new idea which will (hopefully) significantly improve the suggestions given.

Currently there are two types of correlations that the suggester considers:


"Classifying" ones ("instance of" and "subclass of") where we take into account the Property id and the value of Statements.
Non-classifying correlations, where only the fact that a Statement with a certain Property id exists on an Item is considered.


Right now these two types of correlations are treated equally when suggesting new Properties to use.

During playing around with various options, I figured that the suggestions based on the "classifying" correlations are usually way better than the ones which are based purely on the fact that two Properties are often used together. Due to this, we decided to implement a setting which will allows us to adjust the weight given to the correlation types ins question.

The pull request for this is at https://github.com/Wikidata-lib/PropertySuggester/pull/179 and the change will need a new PropertySuggester 4.0.

Once this has been deployed, we can undo the workarounds for this bug and then see what the right weight for classifying correlations should be. In my tests rather "extreme" values like 0.75 : 0.25 or even 0.8 : 0.2 worked best, so I would suggest trying these for starters.

Note: Suggestions for qualifiers and references, and suggestions for Items without instance of/ subclass of wont be affected by this at all.TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-11-02 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2016-11-02T12:11:32Z]  Updated Wikidata's property suggester with data from Monday's json dump and applied the T132839 workaroundsTASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-10-13 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2016-10-13T10:31:25Z]  Ran (updated) T132839-Workarounds.sh from my home in terbiumTASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-10-07 Thread hoo
hoo added a comment.
I thought about this for a bit and have the following improvement in mind, which is going to work on the data structure we currently have, thus we can do a rather minimal change in the extension code in order to achieve this.

The current model being used is described on a very high level in T132839#2270026.

My suggestion:
For all properties used on an Item, get the probabilities for their use together with other properties. Average this across all used properties then.

Additionally we could experiment with weighting the average. This could be done by data type (so that we could for example make external ids weigh in less).  We could also try to put less weight on properties that are being used together with a lot of other properties, as that might indicate that the property is not well suited for getting topic specific suggestions.TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-10-04 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2016-10-04T13:40:34Z]  Updated Wikidata's property suggester with data from Monday's json dump and applied the T132839 workaroundsTASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-09-14 Thread hoo
hoo added a comment.
Further updated the workaround:
It now also remove suggestions based on P18 (image) and P373 (commons category).TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-09-14 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2016-09-14T15:50:56Z]  Ran T132839-Workarounds.sh from my home in terbium (see T132839)TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-09-14 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2016-09-14T15:50:56Z]  Ran T132839-Workarounds.sh from my home in terbium (see T132839)TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-09-06 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL [2016-09-07T01:07:24Z]  Updated Wikidata's property suggester with data from Monday's json dump and applied the T132839 workaroundsTASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-08-28 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL [2016-08-28T16:51:23Z]  Ran T132839-Workarounds.sh from my home in terbium (see T132839)TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-08-28 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL [2016-08-28T16:51:23Z]  Ran T132839-Workarounds.sh from my home in terbium (see T132839)TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-08-28 Thread hoo
hoo added a comment.
Updated the workaround further, per @Sjoerddebruin:

hoo@terbium:~$ bash T132839-Workarounds.sh 
Removing ext ids in item context
Batch 1: 0 rows

Removing P641 in item context
Batch 1: 0 rows

Removing P1344 in item context
Batch 1: 66 rows
Batch 2: 0 rows

Removing P463 in item context
Batch 1: 110 rows
Batch 2: 0 rowsTASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: AnjaJentzsch, Ladsgroup, Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-08-24 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL [2016-08-24T21:08:26Z]  Ran DELETE FROM wbs_propertypairs WHERE pid1 = '641' on Wikidata for T132839TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-08-16 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL [2016-08-16T12:53:43Z]  Put a better workaround for T132839 in place: Only remove property pairs with context = "item". This keeps ref and qualifier pairs for ext ids intact.TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-08-08 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL [2016-08-08T21:55:14Z]  Updated Wikidata's property suggester with data from today's json dump and removed the external identifiers as a workaround for T132839TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-07-07 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL [2016-07-07T08:33:55Z]  Updated Wikidata's property suggester with data from Monday's json dump and removed the external identifiers as a workaround for T132839TASK DETAILhttps://phabricator.wikimedia.org/T132839EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-28 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL [2016-05-28T19:47:45Z]  Updated Wikidata's property 
suggester with data from Monday's json dump and removed the external 
identifiers as a workaround for https://phabricator.wikimedia.org/T132839

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Stashbot
Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, 
Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-25 Thread thiemowmde
thiemowmde added a comment.


  Thanks for the clarification, this is indeed an important difference. 
Properties like ISBN are not classifying by value but by the pure fact that 
they exist.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: thiemowmde
Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, 
Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-24 Thread hoo
hoo added a comment.


  In https://phabricator.wikimedia.org/T132839#2322937, @thiemowmde wrote:
  
  > […]
  >  I suggest to:
  >
  > 1. Add unspecific identifiers that apply to all kinds of items to the 
`$wgPropertySuggesterClassifyingPropertyIds` setting.
  
  
  Why? I don't see how an identifier **value** could ever be classifying 
(`$wgPropertySuggesterClassifyingPropertyIds` is about classifying based on 
values not just the properties used, although that might not be correctly 
implemented right now for strings/ external ids).

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, 
Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-24 Thread thiemowmde
thiemowmde added a comment.


  I believe this is wrong. There are "external-identifier" properties that 
really should be suggested the moment it becomes clear what kind of item you 
are editing. For example, something with "instanceof book" should get an ISBN 
number a.s.a.p., and the other way around. Or: an item that happens to be an 
Rijksmonument in the Netherlands must get an Rijksmonument ID.
  
  I suggest to:
  
  1. Add unspecific identifiers that apply to all kinds of items to the 
`$wgPropertySuggesterClassifyingPropertyIds` setting.
  2. Add stuff to `$wgPropertySuggesterDeprecatedIds` that should never be 
suggested, except you search for it explicitly.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: thiemowmde
Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, 
Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-24 Thread daniel
daniel added a comment.


  It does not look like PropertySuggester-Python is currently applying any 
filtering based on data type (or value type). If we want to add such filtering, 
the best place would probably be in `write_row` in `CsvWriter.php`. We could 
also filter while reading the input file, but we would have to do this twice, 
in `JsonReader` and in `XmlReader`.
  
  Note however, if we filter out properties with specific data types 
completely, such properties will never be suggested.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, 
Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-10 Thread hoo
hoo added a comment.


  In https://phabricator.wikimedia.org/T132839#2280617, @thiemowmde wrote:
  
  > We also came up with a possible improvement: Some properties like "instance 
of" and "Commons category" are not selective. The fact that this property 
exists on an item does not say anything. We think it's a good idea to add such 
properties to a "non-selective" blacklist (or to the existing blacklist). This 
should reduce noise.
  
  
  We have special handling for instance of and subclass of that avoid this 
behaviour (these are "classifying properties"). Excluding very generic ones 
like identifiers and certain string ones also is probably a good idea (or, in 
the long run, weight them lower?).
  
  In https://phabricator.wikimedia.org/T132839#2281371, @thiemowmde wrote:
  
  > FYI, I did an other run of code review on 
https://github.com/Wikidata-lib/PropertySuggester-Python and 
https://github.com/Wikidata-lib/PropertySuggester and could not find more 
suspicious code. The Python script should produce massive amounts of warnings 
when a datatype is missing. Does this happen? Are these logs reviewed after the 
script is run?
  
  
  I saw it before, but very rarely (like once or twice for a dump run at some 
point).
  
  I'll probably do a new dump run tomorrow and will examine the logs after, but 
I don't think that's going to give us any new insights.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, 
Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-10 Thread thiemowmde
thiemowmde added a comment.


  FYI, I did an other run of code review on 
https://github.com/Wikidata-lib/PropertySuggester-Python and 
https://github.com/Wikidata-lib/PropertySuggester and could not find more 
suspicious code. The Python script should produce massive amounts of warnings 
when a datatype is missing. Does this happen? Are these logs reviewed after the 
script is run?

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: thiemowmde
Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, 
Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-10 Thread thiemowmde
thiemowmde added a comment.


  We also came up with a possible improvement: Some properties like "instance 
of" and "Commons category" are not selective. The fact that this property 
exists on an item does not say anything. We think it's a good idea to add such 
properties to a "non-selective" blacklist (or to the existing blacklist). This 
should reduce noise.
  
  Hm. I just realized that "Commons category" does tell you one thing: That 
there are pictures and the item should have an "image" property too.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: thiemowmde
Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, 
Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-10 Thread hoo
hoo added a comment.


  In https://phabricator.wikimedia.org/T132839#2280542, @Tobi_WMDE_SW wrote:
  
  > @Lydia_pintscher suggests to look into the code again and find out whether 
the problem comes from a change to Wikidata that's not reflected in 
PropertySuggester.
  
  
  We already did that three times by now, I think… not sure where the point in 
repeating that is.
  
  The next part to look for here would be (in my opinion) to get the exact 
query that the suggester is running and then try to look through the results. 
In the end I'm fairly sure my analysis at 
https://phabricator.wikimedia.org/T132839#2270026 is correct. I find it quite 
unlikely that the quite naive current algorithm can give meaningful results for 
larger items.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, 
Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-10 Thread Tobi_WMDE_SW
Tobi_WMDE_SW added a comment.


  @Lydia_pintscher suggests to look into the code again and find out whether 
the problem comes from a change to Wikidata that's not reflected in 
PropertySuggester.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Tobi_WMDE_SW
Cc: Tobi_WMDE_SW, daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, 
Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-06 Thread Nikki
Nikki added a comment.


  The suggestions right now seem to be better than before, e.g. for the example 
in the description I get `P131`, mouth of the watercourse, sex or gender, date 
of birth. That still includes human properties, but at least mouth of the 
watercourse actually shows up now.
  
  Not quite the same, but presumably caused by how it decides which properties 
to select: I also often see country-specific properties for large countries 
show up as suggestions for items in other countries, e.g. 
https://www.wikidata.org/wiki/Q504582 currently suggests "China administrative 
division code" despite the item having the country set to the USA. If there's a 
change that would improve things like that too, that would be awesome. :)

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nikki
Cc: daniel, mkroetzsch, Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, 
hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, 
Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-06 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  @hoo removed the external IDs from the correlation table. This seems to 
improve the situation for now. We'll still need to find a better solution 
though.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, 
Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-06 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  @hoo tried the old correlation data and the suggestions are just as bad. This 
indicates a problem with the code.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, 
Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-06 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL [2016-05-06T11:10:01Z]  Reverted the property suggester 
data to data from the 20160411 dump (done testing 
https://phabricator.wikimedia.org/T132839)

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Stashbot
Cc: Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, 
Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-05-06 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL [2016-05-06T11:02:30Z]  Overwrote property suggester 
data with data from the 20160215 dump 
(https://phabricator.wikimedia.org/T132839)

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Stashbot
Cc: Stashbot, thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, 
Nikki, Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-04-26 Thread hoo
hoo added a comment.


  In https://phabricator.wikimedia.org/T132839#2239348, @thiemowmde wrote:
  
  > Ideas we had in todays meeting:
  >
  > - Could it be that PropertySuggester ignores all new data types (e.g. 
identifier)? On first look, 
https://github.com/wmde/wbs_propertypairs/blob/master/20160314/wbs_propertypairs.csv.gz
 is  3.42 MB but 
https://github.com/wmde/wbs_propertypairs/blob/master/20160411/wbs_propertypairs.csv.gz
 from the next month is 3.57 MB.
  
  
  Should not be the case, I addressed that with 
https://github.com/Wikidata-lib/PropertySuggester-Python/commit/6fc5610f3e0383d676c0abde20dcee7029274723
 (which is applied on the host where I create the suggester data).
  
  > - We can undo the last database update to the one from March and see if the 
problem is still there. If it is, the problem is code. Otherwise it's just the 
data we have.
  
  Sure, we can try this for a bit.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: thiemowmde, JanZerebecki, Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, 
Aklapper, D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-04-21 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  Yeah I think this is quite recent too.
  I don't think splitting it between identifiers/nonidentifiers will solve the 
underlying issue here.
  
  I take it from Marius' comments that the correlation data didn't change 
significantly. This leads me to suspect a change in the code being at fault.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-04-18 Thread Nikki
Nikki added a comment.


  Its obsession with human properties does seem to be quite recent and 
noticeable. If it had been like this all along, I'm not sure why I would 
suddenly be noticing it so much now.
  
  I'm not sure how your suggestion would work. None of the suggestions for the 
example I gave are identifiers and if you want to add "mouth of the 
watercourse" (which is also not an identifier), wouldn't you still get the same 
set of suggestions?
  
  Personally I really like that I don't have to use specific "add" buttons to 
add specific types of statements. I don't pay any attention to which one I 
click (actually, most of the time I use the KeyShortcuts gadget and press "a", 
which appears to trigger the one in the identifiers section), they're all just 
statements to me and if I'm adding a bunch of them, I don't mentally filter 
them into different groups first.

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nikki
Cc: Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-04-18 Thread Sjoerddebruin
Sjoerddebruin added a comment.


  In https://phabricator.wikimedia.org/T132839#2213354, @hoo wrote:
  
  > Can anyone confirm this is a recent regression? I have the feeling it is, 
but I don't really know the old suggestions well enough to say for sure.
  >
  > A possible solution I could think of: Make the suggester smarter about 
statement groups (so that it only suggests external ids in the external id 
section) and then reduce the minimal probability for showing suggestions?
  
  
  There were some overlapping areas, but it seems worser since a few weeks yes. 
I agree with your solution!

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Sjoerddebruin
Cc: Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T132839: Property suggester suggests human properties for non-human items

2016-04-18 Thread hoo
hoo added a comment.


  Can anyone confirm this is a recent regression? I have the feeling it is, but 
I don't really know the old suggestions well enough to say for sure.
  
  A possible solution I could think of: Make the suggester smarter about 
statement groups (so that it only suggests external ids in the external id 
section) and then reduce the minimal probability for showing suggestions?

TASK DETAIL
  https://phabricator.wikimedia.org/T132839

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Lydia_Pintscher, hoo, Sjoerddebruin, Nikki, Aklapper, D3r1ck01, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs