[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2024-02-17 Thread Nikki
Nikki added a comment.


  In T303677#9551210 , 
@Midleading wrote:
  
  > If these automated descriptions weren't in WDQS (and consumed triples), how 
could the label service fetch them and bring it into results? Generating 
descriptions on the fly couldn't work well with queries with too many results. 
Other queries using schema:description couldn't work at all.
  
  The generated descriptions we're talking about here would come from the 
labels of `P31` statements on the items, which can be selected using 
`wdt:P31/rdfs:label`.
  
  See this query for example: https://w.wiki/9CUR. That's 25 semi-random items, 
their current description, the label of their `P31` statement, and a 
description created by using the current description if it exists, or the `P31` 
label if not. I assume the label service would do something similar to that.

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nikki
Cc: Midleading, dr0ptp4kt, VIGNERON, Gehel, dcausse, Lydia_Pintscher, 
Jdforrester-WMF, Denny, tfmorris, AndrewTavis_WMDE, Fuzheado, valerio.bozzolan, 
Lectrician1, waldyrious, Michael, DVrandecic, Bugreporter, Manuel, Nikki, 
Epidosis, Mahir256, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2024-02-16 Thread Midleading
Midleading added a comment.


  If these automated descriptions weren't in WDQS (and consumed triples), how 
could the label service fetch them and bring it into results? Generating 
descriptions on the fly couldn't work well with queries with too many results. 
Other queries using schema:description couldn't work at all.

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Midleading
Cc: Midleading, dr0ptp4kt, VIGNERON, Gehel, dcausse, Lydia_Pintscher, 
Jdforrester-WMF, Denny, tfmorris, AndrewTavis_WMDE, Fuzheado, valerio.bozzolan, 
Lectrician1, waldyrious, Michael, DVrandecic, Bugreporter, Manuel, Nikki, 
Epidosis, Mahir256, Aklapper, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2023-09-10 Thread Lectrician1
Lectrician1 added a comment.


  This is somewhat related as autogeneration could also help human editors... 
but
  
  After estimating how much time editors spend on adding edit summaries to 
Wikipedia edits , I figured we could 
do the same for Wikidata editors manually adding item descriptions.
  
  Here's the Quarry query I used to find this: 
https://quarry.wmcloud.org/query/76538
  
  The query only counts non-bot description addition revisions based on the 
recentchanges table (so edits in past 30 days) that are ones performed through 
the editor (mw.edit) and not a tool, are not revisions that are reverted or 
restored (since those have autogenerated summaries), are not quickstatements 
edits, and the character count does not include the autogenerated section names 
of the edit summary (names between /* */) so it only includes the description 
added.
  
  Average number of human-typed descriptions added per day on wikidata: 5,349
  Average number of typed characters (non auto-generated) per description: 
43.8792
  
  Assuming descriptions are typed at an average typing speed of 200 characters 
per minute or 50 wpm, these results calculate to:
  
  **Note:** This is across all description languages added. Some languages may 
have different characters per minute typing speeds.
  
  234,660 description characters typed per day
  1,173 minutes per day typed
  19.5 hours per day spent typing descriptions
  7,137 hours per year spent typing descriptions

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lectrician1
Cc: Gehel, dcausse, Lydia_Pintscher, Jdforrester-WMF, Denny, tfmorris, 
AndrewTavis_WMDE, Fuzheado, valerio.bozzolan, Lectrician1, waldyrious, Michael, 
DVrandecic, Bugreporter, Manuel, Nikki, Epidosis, Mahir256, Aklapper, 
Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2023-09-07 Thread Lydia_Pintscher
Lydia_Pintscher added subscribers: Denny, Jdforrester-WMF, Lydia_Pintscher.
Lydia_Pintscher added a comment.


  @Denny, @Jdforrester-WMF and I discussed this and the overlap with abstract 
descriptions at Wikimania. Here is what we came up with:
  We change Wikibase to generate an automated description. Initially this just 
takes the first best-ranked instance-of value. Once Wikifunctions and Abstract 
Wikipedia are ready we can swap out this simple logic for something more 
complex. This avoids the complexity increase I feared and gives us a sensible 
way forward now I think.

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Lydia_Pintscher, Jdforrester-WMF, Denny, tfmorris, AndrewTavis_WMDE, 
Fuzheado, valerio.bozzolan, Lectrician1, waldyrious, Michael, DVrandecic, 
Bugreporter, Manuel, Nikki, Epidosis, Mahir256, Aklapper, 
Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2023-08-10 Thread Nikki
Nikki added a comment.


  In T303677#9035100 , 
@tfmorris wrote:
  
  > I'm surprised that this hasn't received any attention in 15 months. As an 
update to @Nikki 's numbers  
there are now on the order of 2.5 **BILLION** of these bot generated 
descriptions. The top 5 alone represent over 2 billion triples. That's a huge 
waste of resources!
  
  What exactly are you counting? (You don't seem to be counting the same thing 
as me, so they can't be directly compared)
  
  I tried redoing my queries (and saved the URLs this time...):
  
  | Item
  | Matching descriptions (March 2022) | Matching descriptions (August 
2023) | 






   |
  | chemical compound (Q11173)    
  | 22,436,766 | 38,777,020 
 | QLever    






  |
  | encyclopedia article (Q13433827)   
  | 9,877,236  | 10,056,470 
 | QLever    






  |
  | galaxy (Q318)   
  | 14,615,397 | 16,149,120 
 | QLever    






  |
  | protein (Q8054)    
  | 1,116,867  | 1,155,777  
 | QLever    






  |
  | scholarly article (Q13442814)  
  | 778,351,557| 813,567,636
 | query 

[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2023-07-24 Thread Manuel
Manuel triaged this task as "High" priority.

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manuel
Cc: tfmorris, AndrewTavis_WMDE, Fuzheado, valerio.bozzolan, Lectrician1, 
waldyrious, Michael, DVrandecic, Bugreporter, Manuel, Nikki, Epidosis, 
Mahir256, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2023-07-23 Thread Bugreporter
Bugreporter added a comment.


  In T303677#9035100 , 
@tfmorris wrote:
  
  > I'm surprised that this hasn't received any attention in 15 months. As an 
update to @Nikki 's numbers  
there are now on the order of 2.5 **BILLION** of these bot generated 
descriptions. The top 5 alone represent over 2 billion triples. That's a huge 
waste of resources!
  >
  > | Q# | Entity Type   | Descriptions (Billions) |
  > | Q13442814  | scholarly article | 1.32|
  > | Q4167836  | Wikimedia category| 0.60|
  > | Q4167410  | Wikimedia disambiguation page | 0.11|
  > | Q11266439  | Wikimedia template| 0.09|
  > | Q101352  | family name   | 0.06|
  > |
  >
  > In addition to the usability and resource issues, there's also a 
substantial language equity issue associated with the lack of this 
functionality. The language with the largest number of descriptions is Dutch 
simply because there's a Dutch speaking bot operator who has vigorously added 
many, many machine generated descriptions 
. On the 
flip side, languages without the privilege of bot operators supporting them go 
wanting and have no way to disambiguate the terms that autocomplete / search 
offers them. Of course, if someone were to start adding machine generated 
descriptions for all those hundreds of languages, the situation would be 
completely untenable from a Blazegraph point of view.
  >
  > As an alternative to a textual description, I'll offer the suggestion to 
consider building an autocomplete widget 
 which looks more like 
this: F37145761: Screen Shot 2023-07-21 at 2.22.16 PM.png 
 That's how Freebase Suggest 
 did it back in 2008. 
Heck, you could even steal the code 
. One non-obvious aspect of 
their implementation was that they used metaschema annotations of types as 
being "Notable" or interesting enough to show the user. Similarly the 
properties which were displayed varied by entity type and were controlled by 
metaschema notations, so you might have birth date and place for a person, but 
containing/parent entity for something like a town or species. Of course, even 
just a simple list of the P31 's would 
be better than the current situation.
  
  See also https://autodesc.toolforge.org/, which is already used in various 
tools (e.g. Mix'n'Match). Previous discussion (dated backed to 2012): 
https://www.wikidata.org/wiki/Wikidata:Automating_descriptions

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Bugreporter
Cc: tfmorris, AndrewTavis_WMDE, Fuzheado, valerio.bozzolan, Lectrician1, 
waldyrious, Michael, DVrandecic, Bugreporter, Manuel, Nikki, Epidosis, 
Mahir256, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2023-07-21 Thread tfmorris
tfmorris added a comment.


  I'm surprised that this hasn't received any attention in 15 months. As an 
update to @Nikki 's numbers  
there are now on the order of 2.5 **BILLION** of these bot generated 
descriptions. The top 5 alone represent over 2 billion triples. That's a huge 
waste of resources!
  
  | Q# | Entity Type   | Descriptions (Billions) |
  | Q13442814  | scholarly article | 1.32|
  | Q4167836  | Wikimedia category| 0.60|
  | Q4167410  | Wikimedia disambiguation page | 0.11|
  | Q11266439  | Wikimedia template| 0.09|
  | Q101352  | family name   | 0.06|
  |
  
  In addition to the usability and resource issues, there's also a substantial 
language equity issue associated with the lack of this functionality. The 
language with the largest number of descriptions is Dutch simply because 
there's a Dutch speaking bot operator who has vigorously added many, many 
machine generated descriptions 
. On the 
flip side, languages without the privilege of bot operators supporting them go 
wanting and have no way to disambiguate the terms that autocomplete / search 
offers them. Of course, if someone were to start adding machine generated 
descriptions for all those hundreds of languages, the situation would be 
completely untenable from a Blazegraph point of view.
  
  As an alternative to a textual description, I'll offer the suggestion to 
consider building an autocomplete widget 
 which looks more like 
this: F37145761: Screen Shot 2023-07-21 at 2.22.16 PM.png 
 That's how Freebase Suggest 
 did it back in 2008. 
Heck, you could even steal the code 
. One non-obvious aspect of 
their implementation was that they used metaschema annotations of types as 
being "Notable" or interesting enough to show the user. Similarly the 
properties which were displayed varied by entity type and were controlled by 
metaschema notations, so you might have birth date and place for a person, but 
containing/parent entity for something like a town or species. Of course, even 
just a simple list of the P31 's would 
be better than the current situation.

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: tfmorris
Cc: tfmorris, AndrewTavis_WMDE, Fuzheado, valerio.bozzolan, Lectrician1, 
waldyrious, Michael, DVrandecic, Bugreporter, Manuel, Nikki, Epidosis, 
Mahir256, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2023-07-21 Thread Manuel
Manuel updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manuel
Cc: Fuzheado, valerio.bozzolan, Lectrician1, waldyrious, Michael, DVrandecic, 
Bugreporter, Manuel, Nikki, Epidosis, Mahir256, Aklapper, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2022-04-28 Thread Bugreporter
Bugreporter added a comment.


  One thing to consider: this may degrade ElasticSearch results.

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Bugreporter
Cc: DVrandecic, Bugreporter, Manuel, Nikki, Epidosis, Mahir256, Aklapper, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2022-03-30 Thread Bugreporter
Bugreporter added a comment.


  For people the P106  value may be 
more useful than P31 .

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Bugreporter
Cc: Bugreporter, Manuel, Nikki, Epidosis, Mahir256, Aklapper, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2022-03-28 Thread Bugreporter
Bugreporter renamed this task from "Automatic generate descriptions for items 
based on their P31 (instance of) values" to "Automatically generate 
descriptions for items based on their P31 (instance of) values".

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Bugreporter
Cc: Manuel, Nikki, Epidosis, Mahir256, Aklapper, Astuthiodit_1, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T303677: Automatically generate descriptions for items based on their P31 (instance of) values

2022-03-28 Thread Bugreporter
Bugreporter renamed this task from "Provide auto-generated descriptions for 
certain classes of items" to "Automatic generate descriptions for items based 
on their P31 (instance of) values".
Bugreporter updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T303677

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Bugreporter
Cc: Manuel, Nikki, Epidosis, Mahir256, Aklapper, Astuthiodit_1, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org