https://bugzilla.wikimedia.org/show_bug.cgi?id=37992
--- Comment #47 from James Forrester <[email protected]> --- [Please note that I am not a lawyer and not currently working in a criminal prosecution or national security context; this is not legal advice, just based on my professional experience.] (In reply to comment #46) > (In reply to comment #44) > > Just a quick note, in case people are pushing hard on getting this done so > > it > > can be used for VisualEditor, it's not the direction we're looking at taking > > for VE. > > > > * We would need something that is client-side, not server-side (this makes > > things faster and avoids the really icky legal issues). > > Are these 'icky legal issues' explained somewhere? Are you referring to the > need for explicit copyright signoff before *distribution*? Why cant that be > delayed until the real save button is pressed? Sorry - when I quickly wrote that yesterday, I assumed they were obvious, but that probably says more about me and my priors than anything else (I have a background in criminal prosecution and central government/security) - my apologies. :-) If we create a server-side private drafts facility, we invite malefactors to (ab)use the facility to store and (using accounts with a shared password) distribute: * Terrorism-related materials; * Child pornography (even if the Drafts are 'just' text, e.g. base64 encoding of images); * Foreign intelligence service / espionage channels (let's please not make the Wikileaks comparisons true); * Other criminal information; and * Civil illicit information (e.g. breaking NDAs, corporate espionage, …) All this kind of stuff is potentially uploaded to the public fora of Wikimedia's wikis (talk pages, etc.), but (because people are aware that it's public) it is much less frequently put there, and so is at a level where it can generally be dealt with by the existing community processes. A private cache of drafts that isn't visible to any other user means that the community couldn't police it, or flag for suppression/other admin action. Because of this, *even if no terrorist/pædophile/spy/criminal/mole ever did this*, WMF would get many, many subpœnæ, warrants, clandestine requests (National Security Letters, etc.) and other legal instruments which would have the effect of hugely increasing the cost to comply with these matters. I appreciate that lots of the other top-10 websites have server-side drafts facilities, but they have lots more money to throw away like this, and (allegedly) direct NSA/FBI/… access to their servers anyway, reducing the legal compliance cost (but something to which we would never agree, of course). People running less-visible services run into this problem from time to time, and in some cases end up shutting down their services; we can't take the risk that we'd need to do this at Wikimedia. > A client-side solution is not going to go down well. People switch between > computers and browsers, use internet cafes, etc. Wikipedia and VE are a > web-app; ordinary users will expect auto-save to persist across their > user-agents. A client side solution isnt auto-save; it is a data-recovery > mechanism only, which is still useful in its own right, but each client will > have its own quirks, etc. I appreciate that this is a pain, but the drafts would probably need to only last for a few hours anyway (it's intentionally a DR solution, as you say) - Wikimedia is in the business of providing wikis, and long-term drafting is completely anti-thetical to the concept of a wiki, where "the page is the draft". If you have a change which is an improvement, even if it's not all it could possibly be, save it so others can collaborate with you. I would be very troubled if we were to move in a direction of encouraging such anti-wiki behaviours. > > * We would need something that lets us persist VE linear-models, or at least > > Parsoid HTML+RDFa (otherwise each save will take dozens of seconds of > > computation, slowing clients down and adding to burdens on the cluster; > > until > > server-side storage of Parsoid HTML+RDFa is undertaken, there's nowhere > > really appropriate for this to be saved for the Drafts extension to > > implement, > > which is currently wikitext-based) > > There is no client-side hit required to persist VE as wikitext. VE already > has an API call that accepts HTML+RDFa and produces wikitext; VE can add an > API > call that accepts HTML+RDFa and persists it as wikitext into the Drafts > tables. I'm sorry, this is not correct. The API call in VE that you mention is a server-side request. The process to save (or get a wikitext diff) works roughly as follows: * VE client converts data model's linear model into Parsoid HTML+RDFa (this can take a noticeable amount of processing time for complex pages) * VE client establishes a connexion with the (very thin) server-side VisualEditor code, and transmits the HTML+RDFa to it (noticeable network time for long pages) * VE server establishes a connexion with the Parsoid service, and transmits the HTML+RDFa to it (server-side IO load) * Parsoid pulls from storage or the MW API the previous version of the page and other data so that it can as-cleanly-as-possible serialise (server-side IO load) * Parsoid serialises the HTML+RDFa from the client into wikitext based on this information (server-side CPU load) * Parsoid responds down the connexion with VE server and transmits the wikitext to it (server-side IO load) * VE server establishes a connexion with the MW cluster, and transmits the wikitext to it (server-side IO load) * MW saves the wikitext, or fails and returns an error code with why (server-side CPU load) This whole process can take multiple seconds (as you will see every time you press "save" in VE), and the first step is a significant CPU drain locally; I think this would be a really not-great thing to make happen every minute in the background, given the burden that VE already exhibits on some users' machines. > Not sure that there is going to be an overall additional burden to the > servers either. Converting HTML+RDFa to wikitext will be a memory&processor > hit to the app servers, whereas persisting HTML+RDFa into the database is an > IO hit as compared to wikitext. Sure. But I'm not talking about persisting to the servers, as explained. > Besides, what percentage of overall server load is spent on processing page > saves to content pages (excluding rendering as autosave in VE doesnt need a > render hit)? I'm guessing that isn't where the WMF resources are being > pinched. By definition, 100% of the Parsoid cluster's load is spent on processing page saves to re-render them now they have changed, either caused by a Parsoid client (like VE) or inside MW from another source (like the wikitext editor). The split is currently weighted towards MW changes, but that's an artefact of the current proportion of edits done with VisualEditor; we expect that figure to rise over time towards 100%. > > * We would want (but not strictly /need/) something very flexible, so we > > could persist undo stacks, etc. which could add a huge amount of > > complexity to the Drafts extension for just one use case/client. > > I agree with Legoktm; it would be great if the source editor also had some > features added to it. undo stacks is a major part of the problems holding up > the deployment of Drafts; Drafts needs to handle multiple revisions of the > same page being saved as a draft. I hear the majority of the editor base > tend to use the source editor most of the time ;-) Sure. But who's going to do that? Building expectations for features on the never-never isn't great for editors. :-( -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
