I appreciate someone does something, but this should have been more
discussed. I would like to highlight that our goal should NOT be to do
this a way that is most simple for developers of mediawiki to
implement and most simple for devops to maintain and setup. Our goal
should be to make this feed most simple to implement into target
application (bot, tool) for the developers of that tool. The ideal
feed should be pretty simple to be parseable by something as trivial
as a shell script with netcat or telnet on remote server (absolutely
no need to use some 3rd party libraries). I am fine with using JSON as
one option, but if it's the only option this new feed is supposed to
provide, it will be very hard to implement in some tools. Basically
anything what will require some extra libraries will make it harder
than it actually is - despite it could be more flexible and faster.

On Mon, Mar 11, 2013 at 12:11 AM, Kevin Israel <[email protected]> wrote:
> On 03/10/2013 06:27 PM, Victor Vasiliev wrote:
>> On 03/10/2013 06:30 AM, Kevin Israel wrote:
>>> On 03/10/2013 12:19 AM, Victor Vasiliev wrote:
>>> One thing you should consider is whether to escape non-ASCII
>>> characters (characters above U+007F) or to encode them using UTF-8.
>>
>> "Whatever the JSON encoder we use does".
>>
>>> Python's json.dumps() escapes these characters by default
>>> (ensure_ascii = True). If you don't want them escaped (as hex-encoded
>>> UTF-16 code units), it's best to decide now, before clients with
>>> broken UTF-8 support come into use.
>>
>> As long as it does not add newlines, this is perfectly fine protocol-wise.
>
> If "Whatever the JSON encoder we use does" means that one day, the
> daemon starts sending UTF-8 encoded characters, it is quite possible
> that existing clients will break because of previously unnoticed
> encoding bugs. So I would like to see some formal documentation of the
> protocol.
>
>>> I recently made a [patch][1] (not yet merged) that would add an opt-in
>>> "UTF8_OK" feature to FormatJson::encode(). The new option would
>>> unescape everything above U+007F (except for U+2028 and U+2029, for
>>> compatibility with JavaScript eval() based parsing).
>>
>> The part between MediaWiki and the daemon does not matter that much
>> (except for hitting the size limit on packets, and even then we are on
>> WMF's internal network, so we should not expect any packet loss and
>> problems with fragmentation). The daemon extracts the wiki name from the
>> JSON it received, so it reencodes the change anyways in the middle.
>
> It's good to know that it's quite easy to change the format of the
> internal UDP packets without breaking existing clients -- that it's
> possible to start using UTF-8 on the UDP side if necessary.
>
> --
> Wikipedia user PleaseStand
> http://en.wikipedia.org/wiki/User:PleaseStand
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to