GWicke added a comment.
> Where do I propose another mechanism for change propagation? The PageUpdater would do exactly what Revision does now: schedule DataUpdates. EventBus & the change propagation service are moving away from scheduling "jobs", and towards an event processing approach based on Kafka. In this model, subscribers react to change events associated with resources. Event production & processing / consumption is decoupled and decentralized. PageUpdater (and RevisionUpdater) as proposed seem to be moving in the opposite direction, towards more jobs & away from event processing. > The bob-store is (potentially) content-adressable, so the same blob may be used for different revisions of different pages. Blob sharing would complicate your storage significantly, as you'd either have to forgo deleting content forever (very expensive for something like HTML renders), or incur significant complexity of implementing an atomic reference counting scheme. For textual content, I am pretty certain that sharing is rare, and the complexity would overall be a loss in performance and reliability. > Even for blobs that have an incremental ID (e.g. using the current text table storage mechanism), the same blob would frequently be used for multiple blobs of the same page. How would a dumb blob store figure out which content belongs to the same page (and is thus similar), if all it has is the content & some metadata, but not the page id, title, revision & render UUID? This is the same design issue that plagues ExternalStore, and something we addressed in RESTBase. With large-window compression algorithms like brotli, we are getting down to 2-3% of the input HTML size (see https://phabricator.wikimedia.org/T122028). Without this locality information, you are likely to use an order of magnitude more storage as you are foregoing efficient delta compression. I am generally trying to work out how RevisionContentLookup would work for use cases like fetching HTML from RESTBase. Some notes / questions: - In addition to title and revision (which I assume remains an integer), we'll need an optional v1 UUID parameter to retrieve specific renders, in both the request & response interfaces. - Will getTouched() return the UUID timestamp of a specific render (last-modified, essentially), or is this about page_touched? Also, should we expose UUIDs to make sure that we have a unique ID with a high-resolution timestamp? - For content from RESTBase, read restrictions are always enforced as part of the API request. No information about the applied restrictions is returned. In this context, getReadRestrictions() would basically always return the empty set. TASK DETAIL https://phabricator.wikimedia.org/T107595 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel, GWicke Cc: Glaisher, JJMC89, RobLa-WMF, Yurik, ArielGlenn, APerson, TomT0m, Krenair, intracer, Tgr, Tobi_WMDE_SW, Addshore, Lydia_Pintscher, cscott, PleaseStand, awight, Ricordisamoa, GWicke, MarkTraceur, waldyrious, Legoktm, Aklapper, Jdforrester-WMF, Ltrlg, brion, Spage, MZMcBride, daniel, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, jayvdb, fbstj, Mbch331, Jay8g, bd808 _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
