[Wikidata-bugs] [Maniphest] T268625: Investigate the significant number of skipped Item IDs for newly created Wikidata items

2020-11-24 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  Thanks a ton, @MisterSynergy!

TASK DETAIL
  https://phabricator.wikimedia.org/T268625

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: MisterSynergy, Tonina_Zhelyazkova_WMDE, Addshore, Pablo-WMDE, 
Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Aklapper, Akuckartz, 
Iflorez, alaa_wmde, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T268625: Investigate the significant number of skipped Item IDs for newly created Wikidata items

2020-11-24 Thread MisterSynergy
MisterSynergy added a comment.


  Following my report in T44362#6638174 
, I looked into this a little 
more. From Wikidata's mediawiki database, I queried page creation times for the 
items created during the reported time period (14 Nov, 1:42 to 21 Nov, 1:42) 
and quickly plotted Q-ID vs. item creation timestamp.
  
  On the upper panel of the attached figure, you can see that there are two 
phases where the curve increases rapidly; a finer evaluation yields steep 
increases from 2020-11-14, 11:54 through 2020-11-16, 20:23, and another shorter 
period from 2020-11-18, 20:24 through 2020-11-18, 23:53. I also plotted the 
item creation rate (in 10 min bins) on the lower panel.
  
  F33924179: new_items_nov2020.png 
  
  We can also compare the Grafana charts on the Wikidata edits 

 dashboard. Particularly during the first and longer period, there have been 
phases where not a single user has been editing at the rate limit (90/min).
  
  My findings are:
  
  - Skipped Q-IDs are not temporarily equally distributed over the one week 
period. It is reasonable to assume that someone triggers this with some sort of 
requests, and these requests are not coming in all the time.
  - The item creation rate (lower panel) looks perfectly sane, including during 
times when the curve on the upper panel increases quickly. This means that most 
of the Q-IDs are skipped during the steeper sections of the upper panel graph.
  - The two steep phases in the upper panel together comprise around 215.000 
seconds. Considering we have lost ~450.000 Q-IDs mostly during that time 
period, it is reasonable to assume that Q-IDs are skipped/wasted at a rate of 
roughly 2/sec during these phases. This is above the rate limit (1.5/sec), but 
since no edits at all seem to go through, I think that the rate limit is likely 
not the cause for this problem.

TASK DETAIL
  https://phabricator.wikimedia.org/T268625

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MisterSynergy
Cc: MisterSynergy, Tonina_Zhelyazkova_WMDE, Addshore, Pablo-WMDE, 
Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Aklapper, Akuckartz, 
Iflorez, alaa_wmde, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T268625: Investigate the significant number of skipped Item IDs for newly created Wikidata items

2020-11-24 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE moved this task from Prioritized Product (prioritised 
from top to bottom) to Wikidata-Campsite-Iteration-∞ on the Wikidata-Campsite 
board.
Lucas_Werkmeister_WMDE edited projects, added Wikidata-Campsite 
(Wikidata-Campsite-Iteration-∞); removed Wikidata-Campsite.

TASK DETAIL
  https://phabricator.wikimedia.org/T268625

WORKBOARD
  https://phabricator.wikimedia.org/project/board/3402/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Pablo-WMDE, Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Aklapper, 
Akuckartz, Iflorez, alaa_wmde, Nandana, lucamauri, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Jonas, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T268625: Investigate the significant number of skipped Item IDs for newly created Wikidata items

2020-11-24 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a subscriber: Pablo-WMDE.
Lucas_Werkmeister_WMDE added a comment.


  Task inspection notes:
  
  - Idea from @Pablo-WMDE: add some logging to the entity ID generating code 
(with stack traces or similar), run our browser test suite (or API edge-to-edge 
tests, or similar), and see if anything unexpected causes entity IDs to be 
generated. Try to build a comprehensive overview of what causes entity IDs to 
be generated.
  
  - We seem to be in agreement that the “proper” solution to the problem of 
skipped item IDs would be to only assign an item ID very late, just before 
saving the item; however, this solution would likely take a lot of effort, 
since we’d have to adjust a lot of code to remove the assumption that item IDs 
are always available. (I dimly remember some related problems when editing 
Forms and Senses, whose IDs below the lexeme ID are also assigned in some 
different way…)
  
  - This doesn’t really belong to the investigation, but assuming that the main 
cause of skipped item IDs is T264450: Entity ID should not be assigned if rate 
limit is hit  (due to T258354: 
remove noratelimit from bot group for Wikidata 
), I am thinking that one solution 
might be to introduce a separate rate limit for new item IDs (with the same 
limit as for page creation, probably), and to check that rate limit right 
before assigning an ID. This would also, in a way, “protect” us from the other 
issues – if you send a million new items with label/description conflicts per 
minute, then now you run into the “new item ID” rate limit before we even check 
for conflicts. (Could also reuse the same `create` rate limit, not sure.)
- Can we confirm if the rate limit (or some other error) is the main cause, 
by checking which errors are returned most frequently by the API? Maybe we have 
a dashboard of API errors or something similar.

TASK DETAIL
  https://phabricator.wikimedia.org/T268625

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Pablo-WMDE, Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Aklapper, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T268625: Investigate the significant number of skipped Item IDs for newly created Wikidata items

2020-11-24 Thread WMDE-leszek
WMDE-leszek updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T268625

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: WMDE-leszek
Cc: Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Aklapper, Akuckartz, 
Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T268625: Investigate the significant number of skipped Item IDs for newly created Wikidata items

2020-11-24 Thread WMDE-leszek
WMDE-leszek updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T268625

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: WMDE-leszek
Cc: Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Aklapper, Akuckartz, 
Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T268625: Investigate the significant number of skipped Item IDs for newly created Wikidata items

2020-11-24 Thread WMDE-leszek
WMDE-leszek updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T268625

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: WMDE-leszek
Cc: Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Aklapper, Akuckartz, 
Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T268625: Investigate the significant number of skipped Item IDs for newly created Wikidata items

2020-11-24 Thread WMDE-leszek
WMDE-leszek added a subscriber: Lucas_Werkmeister_WMDE.
WMDE-leszek updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T268625

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: WMDE-leszek
Cc: Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Aklapper, Akuckartz, 
Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T268625: Investigate the significant number of skipped Item IDs for newly created Wikidata items

2020-11-24 Thread WMDE-leszek
WMDE-leszek updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T268625

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: WMDE-leszek
Cc: Lydia_Pintscher, WMDE-leszek, Aklapper, Akuckartz, Nandana, lucamauri, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T268625: Investigate the significant number of skipped Item IDs for newly created Wikidata items

2020-11-24 Thread WMDE-leszek
WMDE-leszek updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T268625

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: WMDE-leszek
Cc: Lydia_Pintscher, WMDE-leszek, Aklapper, Akuckartz, Nandana, lucamauri, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T268625: Investigate the significant number of skipped Item IDs for newly created Wikidata items

2020-11-24 Thread WMDE-leszek
WMDE-leszek created this task.
WMDE-leszek added projects: MediaWiki-extensions-WikibaseRepository, Wikidata, 
Wikidata-Campsite.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  T44362  reports a huge number of 
skipped item IDs when new Wikidata items are created.
  
  We should understand what leads to the issue that prevent creation of new 
Items, and in turn to "skipping" the ID.
  In particular, it would be important to find out if any of the recent code 
changes has increased the "conflict" rate.
  
  There is a number of hypotheses of issues that could be leading to creation 
problems, reported as:
  T264448: Entity ID should not be assigned for label/description conflict 

  T264449: Entity ID should not be assigned for invalid entity data 

  T264450: Entity ID should not be assigned if rate limit is hit 

  T264451: Entity ID should not be assigned if blocked by AbuseFilter 


TASK DETAIL
  https://phabricator.wikimedia.org/T268625

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: WMDE-leszek
Cc: Lydia_Pintscher, WMDE-leszek, Aklapper, Akuckartz, Nandana, lucamauri, 
Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs