[Wikidata-bugs] [Maniphest] T260778: [Investigation] Check if we can use RevDoc but inject property data type from another datasource

2020-08-26 Thread Lydia_Pintscher
Lydia_Pintscher closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T260778

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Lydia_Pintscher
Cc: hoo, guergana.tzatchkova, Michael, Aklapper, Lydia_Pintscher, Ladsgroup, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, jeropbrenda, Nandana, Lahi, Gq86, 
Xinbenlv, Vacio, Capankajsmilyo, GoranSMilovanovic, Fz-29, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, notconfusing, Wikidata-bugs, 
aude, Ricordisamoa, Alchimista, He7d3r, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T260778: [Investigation] Check if we can use RevDoc but inject property data type from another datasource

2020-08-25 Thread Ladsgroup
Ladsgroup added a comment.


  I want to also add that using wbgetentities is not possible in ores because 
it doesn't support sending revids and special entity data would work in ores 
because of ores heavily depending on mwapi library which doesn't support such 
requests (we would need to inject a new type of session which seems like a big 
overhead).

TASK DETAIL
  https://phabricator.wikimedia.org/T260778

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: hoo, guergana.tzatchkova, Michael, Aklapper, Lydia_Pintscher, Ladsgroup, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, jeropbrenda, Nandana, Lahi, Gq86, 
Xinbenlv, Vacio, Capankajsmilyo, GoranSMilovanovic, Fz-29, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, notconfusing, Wikidata-bugs, 
aude, Ricordisamoa, Alchimista, He7d3r, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T260778: [Investigation] Check if we can use RevDoc but inject property data type from another datasource

2020-08-24 Thread Ladsgroup
Ladsgroup added a comment.


  In T260778#6406208 , 
@Lydia_Pintscher wrote:
  
  > Would the last option mean that we'd still get the same quality score for a 
given Item no matter if it was scored live or from the dumps?
  
  They currently differ already,, because ores dump analyzor can't hit API for 
every entity to get property suggester output but it's not too much of a 
difference. We won't increase the gap but probably need to re-implement some 
features twice.

TASK DETAIL
  https://phabricator.wikimedia.org/T260778

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: hoo, guergana.tzatchkova, Michael, Aklapper, Lydia_Pintscher, Ladsgroup, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, jeropbrenda, Nandana, Lahi, Gq86, 
Xinbenlv, Vacio, Capankajsmilyo, GoranSMilovanovic, Fz-29, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, notconfusing, Wikidata-bugs, 
aude, Ricordisamoa, Alchimista, He7d3r, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T260778: [Investigation] Check if we can use RevDoc but inject property data type from another datasource

2020-08-24 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  Would the last option mean that we'd still get the same quality score for a 
given Item no matter if it was scored live or from the dumps?

TASK DETAIL
  https://phabricator.wikimedia.org/T260778

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup, Lydia_Pintscher
Cc: hoo, guergana.tzatchkova, Michael, Aklapper, Lydia_Pintscher, Ladsgroup, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, jeropbrenda, Nandana, Lahi, Gq86, 
Xinbenlv, Vacio, Capankajsmilyo, GoranSMilovanovic, Fz-29, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, notconfusing, Wikidata-bugs, 
aude, Ricordisamoa, Alchimista, He7d3r, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T260778: [Investigation] Check if we can use RevDoc but inject property data type from another datasource

2020-08-21 Thread Ladsgroup
Ladsgroup claimed this task.
Ladsgroup moved this task from To Do to Peer Review on the Item Quality Scoring 
Improvement (Item Quality Scoring Improvement - Sprint 1) board.
Ladsgroup added a comment.
Restricted Application added a project: User-Ladsgroup.


  Our options and the downsides of each option:
  
  - Using wbgetentity/special:entitydata (the approach suggested that's 
alternative to this one)
- Has downside of basically not being operable on dumps as our entity dumps 
don't have histories, etc., we can't run it on xmldumps because we can't just 
inject the mapping somehow
  - The separate data source (This suggestion)
- Has huge performance downsides, you have to hit something like 
https://www.wikidata.org/w/api.php?action=wbgetentities=P17=datatype 
for every request but it also doesn't fully solve the problem of dumps because 
it still would hit api in every dump history read because that's how ores 
handles datasources (it drops them for the next read) so 1B API hits just for 
this if we want to rebuild the history dump
  - We can introducing concept of localserver cache and hold the mapping there 
(which ores should have and use it anyway)
- That would be a lot of work
- Also I'm not sure how that can be wired to the model and features (maybe 
as a datasource? then ores injects the datasource as a extractor? but then 
extractors don't have a chain to fallback to if it's not in the cache.
  - We can add a basic cache wrapper around the APIExtractor, so anything 
stays there but that would bloat the memory footprint drastically and also 
wouldn't fully solve the performance issue (because it would need to hit API 
quickly when the really hot cache expires) and it also needs to hit them when 
it sees a new combination of properties...
  - Hard-code the mapping which means we need to manually maintain such list 
and bloat the model and its memory footprint (probably around 1 GB per node) 
just for this.
  - Diverge the dump-based model and the API based model and go for the first 
option for API and the fourth option for dumps (it wouldn't bloat the memory 
for that).
- We already do this because of the item completeness issue (it has to hit 
property suggester API for every item). It doesn't mean we need to drop all 
features using data types, it just means we can either hard-code the mapping or 
hit the API for first part and then reshape the feature processing part based 
on that (same features, different ways of achieving them)
- It has the downside of duplicating some efforts but it's not that much
  
  Honestly, the last option sounds least hard to achieve. I think we should go 
that way.

TASK DETAIL
  https://phabricator.wikimedia.org/T260778

WORKBOARD
  https://phabricator.wikimedia.org/project/board/4937/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: hoo, guergana.tzatchkova, Michael, Aklapper, Lydia_Pintscher, Ladsgroup, 
Hazizibinmahdi, Akuckartz, darthmon_wmde, jeropbrenda, Nandana, Lahi, Gq86, 
Xinbenlv, Vacio, Capankajsmilyo, GoranSMilovanovic, Fz-29, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, notconfusing, Wikidata-bugs, 
aude, Ricordisamoa, Alchimista, He7d3r, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T260778: [Investigation] Check if we can use RevDoc but inject property data type from another datasource

2020-08-20 Thread Ladsgroup
Ladsgroup added a subscriber: hoo.
Ladsgroup added a comment.


  This has the benefit of making dump analysis pretty much simpler but since 
datasources change for each requests in live requests, it won't affect 
performance (it slightly decreases it). We can hard-code property data types 
into ores but it would bloat the model file (and maintaining the list is 
another headache). A great solution would be to have a local server cache and 
keep it there but that's a long shot :(

TASK DETAIL
  https://phabricator.wikimedia.org/T260778

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: hoo, guergana.tzatchkova, Michael, Aklapper, Lydia_Pintscher, Ladsgroup, 
Akuckartz, darthmon_wmde, jeropbrenda, Nandana, Lahi, Gq86, Xinbenlv, Vacio, 
Capankajsmilyo, GoranSMilovanovic, Fz-29, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, notconfusing, Wikidata-bugs, aude, Ricordisamoa, 
Alchimista, He7d3r, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T260778: [Investigation] Check if we can use RevDoc but inject property data type from another datasource

2020-08-19 Thread Ladsgroup
Ladsgroup created this task.
Ladsgroup added projects: Wikidata, Machine Learning Platform, revscoring, 
artificial-intelligence, Item Quality Scoring Improvement (Item Quality Scoring 
Improvement - Sprint 1).

TASK DESCRIPTION
  For example, get the property data types from another datasource (an API 
call) and inject it to the output.
  
  Timebox: 5-8 hours

TASK DETAIL
  https://phabricator.wikimedia.org/T260778

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Michael, Aklapper, Lydia_Pintscher, Ladsgroup, guergana.tzatchkova, 
Akuckartz, darthmon_wmde, jeropbrenda, Nandana, Lahi, Gq86, Xinbenlv, Vacio, 
Capankajsmilyo, GoranSMilovanovic, Fz-29, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, notconfusing, Wikidata-bugs, aude, Ricordisamoa, 
Alchimista, He7d3r, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs