Lucas_Werkmeister_WMDE added a comment.

  In T231276#5441664 <https://phabricator.wikimedia.org/T231276#5441664>, 
@ArielGlenn wrote:
  
  > In T231276#5441586 <https://phabricator.wikimedia.org/T231276#5441586>, 
@Lucas_Werkmeister_WMDE wrote:
  >
  >> 
  >
  > ...
  >
  >> It’s part of the serialization. Not sure why that would be a new issue, 
though – this seems like a fairly fundamental issue (tying the page ID to the 
page content even though it’s not stable across delete+restore). Is it possible 
that File:Bolsonaro_etc is just the first file with structured data that was 
deleted and then restored?
  >
  > I'd be willing to put money on that.
  
  I think you’d lose it :) found some more with an ugly query:
  
    SELECT log_id, log_page, log_title, rev_id
    FROM logging
    JOIN revision ON log_page = rev_page
    JOIN slots ON rev_id = slot_revision_id
    -- this log entry restores a file
    WHERE log_type = 'delete'
    AND log_action = 'restore'
    AND log_namespace = 6
    -- and there is a corresponding revision, predating the restoration, that 
already had a mediainfo slot
    AND slot_role_id = (SELECT role_id FROM slot_roles WHERE role_name = 
'mediainfo')
    AND rev_timestamp < log_timestamp
    -- captions were introduced in January 2019, so we can skip all earlier 
revisions and log entries
    AND rev_timestamp > 20190101000000
    AND log_timestamp > 20190101000000
    -- the restoration did not reuse the page ID (which we get from a 
corresponding deletion)
    AND log_page != (SELECT logdel.log_page FROM logging AS logdel WHERE 
logdel.log_type = 'delete' AND logdel.log_action = 'delete' AND 
logdel.log_namespace = 6 AND logdel.log_title = logging.log_title LIMIT 1)
    LIMIT 10;
  
  For example, File:PL_Stanisław_Witkiewicz-Na_przełęczy_013.jpeg 
<https://commons.wikimedia.org/wiki/File:PL_Stanis%C5%82aw_Witkiewicz-Na_prze%C5%82%C4%99czy_013.jpeg>
 had a caption in revision 334323754 
<https://commons.wikimedia.org/wiki/Special:PermanentLink/334323754>, 22:37, 10 
January 2019; then was deleted 
<https://commons.wikimedia.org/wiki/Special:Redirect/logid/278125433> 23:10 of 
the same day; and later restored. Curiously enough, according to the log entry, 
the page ID at the time was 11632736 (`log_page` of `log_id = 278125433`); yet, 
the serialization of the previous revision, 334323754, already contains the 
entity ID M75745807 (checked using this code 
<https://wikitech.wikimedia.org/wiki/User:Lucas_Werkmeister_(WMDE)/How_to_get_the_raw_text_of_a_page_or_revision>,
 but with `mediainfo` instead of `main` for the slot).
  
  Perhaps WikibaseMediaInfo already contains code that’s supposed to take care 
of this? (Although updating serializations of old revisions like that sounds 
super dangerous to me.) And it broke recently?

TASK DETAIL
  https://phabricator.wikimedia.org/T231276

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: hashar, ArielGlenn, Lucas_Werkmeister_WMDE, Liuxinyu970226, Aklapper, 
zeljkofilipin, darthmon_wmde, alaa_wmde, DannyS712, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Jonas, 
Wikidata-bugs, aude, Ricordisamoa, Lydia_Pintscher, Jdforrester-WMF, Mbch331, 
Jay8g, Krenair
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to