On Tue, Nov 10, 2020 at 5:50 PM Gergo Tisza <gti...@wikimedia.org> wrote:
> On Tue, Nov 3, 2020 at 1:59 AM Daniel Kinzler <dkinz...@wikimedia.org> > wrote: > >> TemplateData already uses JSON serialization, but then compresses the >> JSON output, to make the data fit into the page_props table. This results >> in binary data in ParserOutput, which we can't directly put into JSON. > > > I'm not sure I understand the problem. Binary data can be trivially > represented as JSON, by treating it as a string. Is it an issue of storage > size? JSON escaping of the control characters is (assuming binary data with > a somewhat random distribution of bytes) an ~50% size increase, UTF-8 > encoding the top half of bytes is another 50%, so it will approximately > double the length - certainly worse than the ~33% increase for base64, but > not tragic. (And if size increase matters that much, you probably shouldn't > be using base64 either.) > The binary aspect here refers to the gzip output buffer. While these are represented in PHP as a string, the string is not encodable as UTF-8 or indeed as JSON. Attempting to do so results in a PHP json error with boolean false returned. Condensed example: https://3v4l.org/cJttU
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l