Am 02.11.20 um 19:24 schrieb Daniel Kinzler:
>
> [Re-posting with fixed links. Thanks for pointing this out Cormac!]
>
> This is the weekly TechCom board review.  Remember that there is no meeting on
> Wednesday, any discussion should happen via email. For individual RFCs, please
> keep discussion to the Phabricator tickets.
>
That's another issue I wanted to raise: Platform Engineeing is working on
switching ParserCache to JSON. For that, we have to make sure extensions only
put JSON-Serializable data into ParserOutput objects, via setProperty() and
setExtensionData(). We are currently trying to figure out how to best do that
for TemplateData.

TemplateData already uses JSON serialization, but then compresses the JSON
output, to make the data fit into the page_props table. This results in binary
data in ParserOutput, which we can't directly put into JSON. There are several
solutions under discussion, e.g.:

* Don't write the data to page_props, treat it as extension data in
ParserOutput. Compression would become unnecessary. However, batch loading of
the data becomes much slower, since each ParserOutput needs to be loaded from
ParserCache. Would it be too slow?

* Apply compression for page_props, but not for the data in ParserOutput. We
would have to introduce some kind of serialization mechanism into PageProps and
LinksUpdate. Do we want to encourage this use of page_props?

* Introduce a dedicated database table for templatedata. Cleaner, but schema
changes and data migration take a long time.

* Put templatedata into the BlobStore, and just the address into page_props.
Makes loading slower, maybe even slower than the solution that relies on
ParserCache.

* Convert TemplateData to MCR. This is the cleanest solution, but would require
us to create an editing interface for templatedata, and migrate out existing
data from wikitext. This is a long term perspective.

To unblock migration of ParserCache to JSON, we need at least a temporary
solution that can be implemented quickly. A somewhat hacky solution I can see 
is:

* detect binary page properties and apply base64 encoding to them when
serializing ParserOutput to JSON. This is possible because page properties can
only be scalar values. So can convert to something like { _encoding_: "base64",
data: "34c892ur3d40" }, and recognize the structure when decoding. This wouldn't
work for data set with setTemplateData, since that could already be an arbitrary
structure.

-- 
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to