Thanks to everyone for all the responses. I'm learning a lot.

In the short term we need to figure out how to make this work without
RESTBase, but I've been convinced by this email chain that in the long term
we'll need to incorporate RESTBase into our setup.


At this point I think I've determined that the problem we're having is not
actually a Parsoid problem, but somehow related to MediaWiki Core (PHP)
response times. Something about my multi-server setup is causing 25% of MW
core response times to be 25x longer than normal. I didn't notice this in
my dev setup, prior to testing Parsoid, probably because I just assumed my
laptop was old and underpowered. In other words, normal page loads were
slower but I just figured that having multiple VMs up on my laptop
functioning as full app servers was the reason. Parsoid evidently has a
default timeout short enough that when Parsoid makes MW core API requests I
was getting failures, causing me to misinterpret it as a Parsoid issue.


To ensure it was not my underpowered laptop I moved my testing to a machine
with 12 CPUs and 64 GB RAM.


Our configuration script that allows us to define our setup as follows:


load balancers = list, of, IP, addresses, ...

app servers = list, of, IP, addresses, ...

memcached servers = list, of, IP, addresses, ...

db master = a.single.ip.address

db replicas = list, of, IP, addresses, ...

parsoid servers = list, of, IP, addresses, ...

elasticsearch servers = list, of, IP, addresses, ...


I have not run it with that many servers yet, but it's theoretically
possible. A single server does not need to fill a single role, so in
testing thus far my configs look more like:


load balancers = server.3.ip.addr

app servers = server.1.ip.addr, server.2.ip.addr

memcached servers = server.1.ip.addr, server.2.ip.addr

db master = server.1.ip.addr

db replicas = server.2.ip.addr

parsoid servers = server.1.ip.addr, server.2.ip.addr

elasticsearch servers = server.1.ip.addr, server.2.ip.addr


In short: three servers, one exclusively a load balancer, two with
everything installed albeit one acting as DB master and the other as DB
replica.


We're running this setup in production with all servers configured as
"localhost", e.g. everything installed on one server.


I'm pretty sure I've narrowed down the 25x-longer-response-times to being a
multiple app-server problem because I can take the dev config above
(server.1.ip.addr, server.2.ip.addr, server.3.ip.addr) and comment out
various servers and re-run deploy. This allows me to quickly switch from a
single app server to two, two DBs to one, etc. I see the issue with
multiple app servers. I don't see it with a single app server, regardless
of whether the other services have 1 or 2 servers.


My LocalSettings.php files are are at [1] and [2] for dual app servers.
These reference Extensions.php which _shouldn't_ have any impact but can be
found at [3]. The files are written by Ansible and I'm kind of bad at
getting the indenting correct...so, sorry about that if it looks funny. All
of this is created by our project called meza [4]. We weren't really
planning on announcing meza yet, but basically its purpose is to simplify
MediaWiki install with all the bells and whistles for "enterprise"
(whatever that means :) ) use cases. We've been running it on a single
server for about a year, but need to migrate to a high availability setup
to support 24/7 mission critical operations.


Any ideas what may cause two load-balanced app servers to respond slowly
25% of the time?


Thanks!

--James


[1]
https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b43108#file-localsettings-app1-php

[2]
https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b43108#file-localsettings-app2-php

[3]
https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b43108#file-extensions-php

[4] https://github.com/enterprisemediawiki/meza

On Fri, Jun 9, 2017 at 12:57 PM, Subramanya Sastry <ssas...@wikimedia.org>
wrote:

> On 06/09/2017 09:57 AM, Gabriel Wicke wrote:
>
> On Fri, Jun 9, 2017 at 12:56 AM, Alexandros Kosiaris <
>> akosia...@wikimedia.org> wrote:
>>
>>> I also don't think you need RESTBase as long as you are willing to
>>> wait for parsoid to finish parsing and returning the result.
>>>
>> Apart from performance, there is also functionality that is missing
>> without
>> RESTBase:
>>
>>     - Diffs are going to contain a lot of extra changes (commonly called
>>     "dirty diffs"), as no original HTML or data-parsoid is available to
>>     Parsoid's selective serialization algorithm. This might make it
>> difficult
>>     to review changes.
>>
> What Gabriel said there about dirty diffs. So, this depends on whether
> wikis are concerned about their wikitext getting normalized to
> "Parsoid-determined canonical" formats (wrt choice of whitespaces, quotes,
> for ex.). For example, this is a extremely important for wikimedia wikis,
> but may be less so for some smaller wikis, if they take a one-time
> normalization dirty diff and adopt identical norms in source editing.
>
>     - Switching between wikitext and visual editing won't work.
>>
>
> This is because of the dirty-diff requirement. As far as I understand,
> even if wikis are okay with dirty diffs, VE's source <-> html switching
> functionality requires restbase right now.
>
>     - Visual editing in general will very likely stop working once we
>> reduce
>>     the size of HTML by separating out metadata (see
>>     https://phabricator.wikimedia.org/T78676). We keep pushing this back
>> due
>>     to a lack of resources, but it is still planned, and might happen
>> within
>>     the next six months.
>>
>
> There are some unresolved questions about how willing (Parsoid) clients
> are to work with this stripped-html format. That and the matter of us being
> resource-strapped means we keep kicking this down the road. But, when this
> happens, this will break VE-editing unless VE and Parsoid hide the data-mw
> stripping behind a config flag.
>
> In short, using Parsoid directly for visual editing is an unsupported
>> configuration, and is likely to stop working altogether in the foreseeable
>> future.
>>
>
> Just to be clear, we haven't yet made any formal decision to go down this
> route, but Gabriel articulates the reasons why it might make sense to do
> this. There are some aspects to consider here:
> (a) whether we want to support this combination behind a config flag at
> all given that some functionality may not be available (unless Parsoid
> clients figure out ways to support some functionality without RESTBase)
> (b) the complexity (maintenance, testing, documentation, support) of
> supporting multiple combinations.
>
> We don't have fully resolved answers to this yet. I don't know what VE's
> take on this is -- so there is also that to consider. But, when we have
> firm resolutions on all of this, we will make suitable announcements on
> lists, suggest upgrade options, and update wikis.
>
> But, also, what Gabriel said earlier about RESTBase. If you are already
> installing Parsoid, adding RESTBase (since it is also node.js) with the
> default sqlite backend might not be a whole lot more complexity. So, if
> VE-editing wikis that use Parsoid start adopting this, that would also
> inform our decisions above.
>
> Subbu.
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to