Re: [Wikitech-l] global cleanup of

2015-06-20 Thread Subramanya Sastry
On 06/20/2015 11:45 AM, Arlo Breault wrote: On Friday, June 19, 2015 at 1:38 AM, Amir E. Aharoni wrote: There may be more - I'm still looking for these. I was reading the discussion on gradually enabling VE for new accounts [3] and Kww writes there, "Further, we still have issues with stray no

[Wikitech-l] Parsoid announcement: Main roundtrip quality target achieved

2015-06-25 Thread Subramanya Sastry
Hello everyone, On behalf of the parsing team, here is an update about Parsoid, the bidirectional wikitext <-> HTML parser that supports Visual Editor, Flow, and Content Translation. Subbu. --- TL:DR; 1. Parsoid[1] roundtr

Re: [Wikitech-l] Parsoid announcement: Main roundtrip quality target achieved

2015-06-26 Thread Subramanya Sastry
On 06/25/2015 06:29 PM, David Gerard wrote: On 25 June 2015 at 23:22, Subramanya Sastry wrote: On behalf of the parsing team, here is an update about Parsoid, the bidirectional wikitext <-> HTML parser that supports Visual Editor, Flow, and Content Translation. xcellent. How clo

Re: [Wikitech-l] [Engineering] Parsoid announcement: Main roundtrip quality target achieved

2015-06-29 Thread Subramanya Sastry
On 06/29/2015 09:19 AM, Brad Jorsch (Anomie) wrote: On Thu, Jun 25, 2015 at 6:22 PM, Subramanya Sastry mailto:ssas...@wikimedia.org>> wrote: * Pare down rendering differences between the two systems so that we can start thinking about using Parsoid HTML instead of MWParse

Re: [Wikitech-l] Parsoid announcement: Main roundtrip quality target achieved

2015-06-29 Thread Subramanya Sastry
On 06/29/2015 09:20 AM, Brad Jorsch (Anomie) wrote: On Fri, Jun 26, 2015 at 11:52 AM, Subramanya Sastry wrote: The "PHP parser" used in production has 3 components: the preprocessor, the core parser, Tidy. Parsoid relies on the PHP preprocessor (access via the mediawiki API), so th

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread Subramanya Sastry
On 07/23/2015 01:07 PM, Ricordisamoa wrote: Il 23/07/2015 15:28, Antoine Musso ha scritto: Le 23/07/2015 08:15, Ricordisamoa a écrit : Are there any stable APIs for an application to get a parse tree in machine-readable format, manipulate it and send the result back without touching HTML? I'm s

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-31 Thread Subramanya Sastry
On 07/31/2015 12:55 PM, Ricordisamoa wrote: Hi Subbu, thank you for this thoughtful insight. And thank you for starting this thread. :-) HTML is not a barrier by itself. The problem seems to be Parsoid being built primarily with VisualEditor in mind. While we want the DOM to be VE-friendly

Re: [Wikitech-l] [Engineering] Content WG: Templating, Page Components & editing

2015-08-12 Thread Subramanya Sastry
On 08/12/2015 09:33 AM, Brad Jorsch (Anomie) wrote: On Tue, Aug 11, 2015 at 7:12 PM, Gabriel Wicke > wrote: TL;DR: Join us to discuss Templates, Page Components & editing on Thu, 13 August, 12:45 – 14:00 PDT [0]. I can't, so I'll just comment here. in

Re: [Wikitech-l] RFC: Replace Tidy with HTML 5 parse/reserialize

2015-08-17 Thread Subramanya Sastry
On 08/17/2015 10:15 PM, MZMcBride wrote: Failing fast and loud is good in lots of contexts. I dont think wiki editing is one of them. The only cited example of real breakage so far has been mismatched s. How often are you or anyone else adding s to pages? In my experience, most users rely on Med

Re: [Wikitech-l] RFC: Replace Tidy with HTML 5 parse/reserialize

2015-08-18 Thread Subramanya Sastry
On 08/18/2015 07:58 AM, MZMcBride wrote: Subramanya Sastry wrote: * Unclosed HTML tags (very common) * Misnested tags * Misnesting of tags (ex: links in links .. [http://foo.bar this is a [[foobar]] company]) * Fostered content in tables (this-content-will-show-up-outside-the-table

Re: [Wikitech-l] RFC: Replace Tidy with HTML 5 parse/reserialize

2015-08-19 Thread Subramanya Sastry
On 08/19/2015 08:22 AM, MZMcBride wrote: And, as several others have noted, you can't just disable Tidy, since the effects of unclosed tags are not confined to the content area, and there is a large amount of existing content that depends on it. I have seen the effects of Tidy being accidentally

[Wikitech-l] Visual diffing updates

2015-09-14 Thread Subramanya Sastry
https://github.com/subbuss/parsoid_visual_diffs I've pushed a bunch of updates over the last week which should now make this usable for comparing HTML files from different sources (not restricted to PHP parser and Parsoid). I did this so that this could be used to compare the rendering of Tidy

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-09-17 Thread Subramanya Sastry
On 09/17/2015 07:44 PM, Ricordisamoa wrote: Stephen Niedzielski: "it seems like, as soon as you get the HTML the first thing you want to do, perhaps a little bit ironically because it's called Parsoid, it's parse the output a little bit more" https://www.youtube.com/watch?v=3WJID_WC7BQ&t=35m14s

Re: [Wikitech-l] Inhibitors for Mobile Content Service to use Parsoid output

2015-10-16 Thread Subramanya Sastry
On 10/15/2015 08:52 PM, Bernd Sitzmann wrote: As part of moving the Mobile Content Service to use Parsoid instead of action=mobileview[1] I've ran into several missing features which make it significantly harder for the Mobile Content Service to use Parsoid, while providing the same functionality

Re: [Wikitech-l] Inhibitors for Mobile Content Service to use Parsoid output

2015-10-19 Thread Subramanya Sastry
On 10/16/2015 01:14 PM, Bernd Sitzmann wrote: In any case, given that Parsoid's HTML clients usually talk through RESTBase rather than with Parsoid directly, this optional API flag would also have to be supported in RESTBase, and could potentially follow the redirects on behalf of the clients. I

Re: [Wikitech-l] Reserving data-mw- attribute prefix in the sanitizer as non-user specifiable

2015-11-02 Thread Subramanya Sastry
On 11/02/2015 05:11 AM, Brian Wolff wrote: We already reserve data-ooui (by reserve, I mean blacklist in the sanitizer). But it feels wrong to use that for parts of mw that are not ooui. I would like to propose that we reserve data-mw- prefix as well for general usage by mediawiki/extensions (By

Re: [Wikitech-l] Red links

2015-11-03 Thread Subramanya Sastry
See T39902 which I see that James has already added to the phab ticket here. The reason why we don't have redlinks as part of Parsoid output is that it defeats caching / storage. Everytime a page is created, all pages that link to it have to be flushed from storage. Subbu. On 11/03/2015 12:06

Re: [Wikitech-l] Parsoid convert arbitrary HTML?

2015-11-06 Thread Subramanya Sastry
On 11/06/2015 10:18 AM, James Montalvo wrote: Can Parsoid be used to convert arbitrary HTML to wikitext? It's not clear to me whether it will only work with Parsoid's HTML+RDFa. I'm wondering if I could take snippets of HTML from non-MediaWiki webpages and convert them into wikitext. The right

Re: [Wikitech-l] Parsoid convert arbitrary HTML?

2015-11-06 Thread Subramanya Sastry
to what Eric & Subbu have said, here is a link to the API documentation for this end point: https://en.wikipedia.org/api/rest_v1/?doc#!/Transforms/post_transform_html_to_wikitext_title_revision On Fri, Nov 6, 2015 at 8:47 AM, Subramanya Sastry wrote: On 11/06/2015 10:18 AM, James Montalvo w

Re: [Wikitech-l] Parsoid still doesn't love me

2015-11-06 Thread Subramanya Sastry
Parsoid is simply a wikitext -> html and a html -> wikitext conversion service. Everything else would be tools and libs built on top of it. Subbu. On 11/06/2015 02:29 PM, Ricordisamoa wrote: What if I need to get all revisions (~2000) of a page in Parsoid HTML5? The prop=revisions API (in ba

Re: [Wikitech-l] Parsoid still doesn't love me

2015-11-09 Thread Subramanya Sastry
On 11/09/2015 12:37 PM, Petr Bena wrote: Do you really want to say that reading from disk is faster than processing the text using CPU? I don't know how complex syntax of mw actually is, but C++ compilers are probably much faster than parsoid, if that's true. And these are very slow. What takes

[Wikitech-l] Parsoid entrypoint http://parsoid-lb.eqiad.wikimedia.org being decommissioned

2016-01-29 Thread Subramanya Sastry
Hello everyone, The public Parsoid endpoint http://parsoid-lb.eqiad.wikimedia.org is being decommissioned [1] once we migrate over all straggler references to that endpoint [1] possibly as soon as 3 weeks from now. As far as we know, there are very few requests to that endpoint right now, b

Re: [Wikitech-l] Parsoid entrypoint http://parsoid-lb.eqiad.wikimedia.org being decommissioned

2016-01-30 Thread Subramanya Sastry
On 01/30/2016 09:50 AM, Bartosz Dziewoński wrote: So what is the replacement for http://parsoid-lb.eqiad.wikimedia.org/_wikitext/ if I just want to see how Parsoid renders a piece of wikitext? It seems the fancy forms at https://en.wikipedia.org/api/rest_v1/?doc don't actually allow me to do t

Re: [Wikitech-l] External images in VE/Parsoid

2016-02-23 Thread Subramanya Sastry
On 02/23/2016 12:45 PM, James Montalvo wrote: I'm trying to make images from an external source, provided by a parser function, work with VisualEditor and Parsoid. For a very simplified illustration I added the following to the bottom of LocalSettings.php ``` $wgExtensionMessagesFiles['myPfTest

Re: [Wikitech-l] External images in VE/Parsoid

2016-02-23 Thread Subramanya Sastry
d was aware of this being valid/trusted HTML (via the isHTML flag), it could wrap the HTML in a DOM fragment and tunnel it through the sanitizer .. for example, as with the tag. Subbu. --James On Tue, Feb 23, 2016 at 1:24 PM, Subramanya Sastry wrote: On 02/23/2016 12:45 PM, James Montal

Re: [Wikitech-l] External images in VE/Parsoid

2016-02-23 Thread Subramanya Sastry
On 02/23/2016 02:46 PM, James Montalvo wrote: Why does it treat the img tag as a literal string, but not an h2 tag, for example? This is what the PHP parser does as well You can try it in a sandbox [1], for example. That is understandable because bare image tags can link to all kinds of ext

Re: [Wikitech-l] External images in VE/Parsoid

2016-02-23 Thread Subramanya Sastry
On 02/23/2016 03:23 PM, James Montalvo wrote: So img tags are not whitelisted, and thus are they are treated as a literal string. By default the PHP parser does the same thing, but there's $wgAllowImageTag to allow img tags. ... I totally understand that bare image tags are not ideal in many ca

Re: [Wikitech-l] Minor REST API cleanup: Remove experimental listings, make timeuuid parameter mandatory for data-parsoid

2016-03-07 Thread Subramanya Sastry
On 03/07/2016 07:49 PM, Gabriel Wicke wrote: tl;dr: You are *very* likely not affected. As a Parsoid-side clarification, data-parsoid is considered private information. This information is primarily used by Parsoid to minimize dirty diffs when edited HTML is converted to wikitext. So, Parso

Re: [Wikitech-l] tags are a usability nightmare for editing on mediawiki.org

2016-04-04 Thread Subramanya Sastry
Niklas and the language team: thanks for your efforts in enabling translation features. They are truly important and necessary. As for the topic of hacks, I feel wikitext's history has been one where people have stepped in to address critical issues / needs that existed at the time with whatev

Re: [Wikitech-l] tags are a usability nightmare for editing on mediawiki.org

2016-04-05 Thread Subramanya Sastry
2. Identifying document fragments for translation is another instance of the same problem of associating metadata with document fragments *across edits*. Citations, content-translation, comments-as-documentation, authorship information, maintaining-association-between-translated-fragments, etc.

[Wikitech-l] Kunal (User:Legoktm) moving to Parsing Team

2016-04-05 Thread Subramanya Sastry
Hi everyone, We would like to let you know that Kunal (User:Legoktm for those who don’t already know) is moving inside the Editing Department from the Collaboration Team to the Parsing Team. The Collaboration team is grateful for Kunal’s great work over the past year, especially on the backe

Re: [Wikitech-l] Kunal (User:Legoktm) moving to Parsing Team

2016-04-05 Thread Subramanya Sastry
I suppose you've figured out that I don't know how to write citations. Subtract -1 from N for all [N] in the body. :-) -S. On 04/05/2016 11:41 AM, Subramanya Sastry wrote: Hi everyone, We would like to let you know that Kunal (User:Legoktm for those who don’t already know) is mov

Re: [Wikitech-l] Kunal (User:Legoktm) moving to Parsing Team

2016-04-08 Thread Subramanya Sastry
On 04/08/2016 05:14 AM, Ricordisamoa wrote: You mean subtract 1, or add -1 *facepalm* :-) Il 05/04/2016 18:53, Subramanya Sastry ha scritto: I suppose you've figured out that I don't know how to write citations. Subtract -1 from N for all [N] in the body. :-) -S. On 04/05/201

Re: [Wikitech-l] some statistics about auto-inserted

2016-05-18 Thread Subramanya Sastry
Thanks Amir .. https://github.com/wikimedia/parsoid/blob/master/tools/fetch_ve_nowiki_edits.js is a quick hackjob of a script that I pulled together back in Oct 2015 which I used for a while to monitor counts (and the actual incidents) of nowikis ... This script could use a refresh and update

Re: [Wikitech-l] Loosing the history of our projects to bitrot. Was: Acquiring list of templates including external links

2016-08-01 Thread Subramanya Sastry
On 08/01/2016 11:37 AM, Marc-Andre wrote: ... Is there something we can do to make the passage of years hurt less? Should we be laying groundwork now to prevent issues decades away? One possibility is considering storing rendered HTML for old revisions. It lets wikitext (and hence parser) ev

Re: [Wikitech-l] Loosing the history of our projects to bitrot. Was: Acquiring list of templates including external links

2016-08-01 Thread Subramanya Sastry
TL:DR; You get to a spec by paying down technical debt that untangles wikitext parsing from being intricately tied to the internals of mediawiki implementation and state. In discussions, there is far too much focus on the fact that you cannot write a BNF grammar or yacc / lex / bison / whate

Re: [Wikitech-l] Loosing the history of our projects to bitrot. Was: Acquiring list of templates including external links

2016-08-03 Thread Subramanya Sastry
On 08/03/2016 07:17 PM, Rob Lanphier wrote: On Mon, Aug 1, 2016 at 10:15 PM, Subramanya Sastry wrote: When [a detailed list of stuff is] done, it become far more feasible to think of defining a spec for wikitext parsing that is not tied to the internals of mediawiki or its extensions. At that

Re: [Wikitech-l] Loosing the history of our projects to bitrot. Was: Acquiring list of templates including external links

2016-08-05 Thread Subramanya Sastry
On 08/03/2016 10:48 PM, Subramanya Sastry wrote: On 08/03/2016 07:17 PM, Rob Lanphier wrote: On Mon, Aug 1, 2016 at 10:15 PM, Subramanya Sastry wrote: ... I think it is feasible to get there. But, whether we want a spec for wikitext and should work towards that is a different question. In

Re: [Wikitech-l] Deploying the Linter extension to Wikimedia wikis

2016-10-24 Thread Subramanya Sastry
On 10/24/2016 08:42 AM, MZMcBride wrote: Does the extension distinguish between errors and warnings? Are there gradations of errors? For example, deprecated syntax v. invalid syntax? Technically, there are no errors with wikitext ... but yes, Parsoid knows what some of these "errors" are and

[Wikitech-l] Parsoid Security Update and Release

2016-11-01 Thread Subramanya Sastry
this release. [1] https://gerrit.wikimedia.org/r/#/c/319115 [2] https://www.mediawiki.org/wiki/Parsoid/Deployments#Monday.2C_October_31.2C_2016_around_1:15_PT:_Y.C2.A0Deployed_e503e801 [3] https://releases.wikimedia.org/debian/pool/main/p/parsoid/ [4] https://www.npmjs.com/package/parsoid Subraman

[Wikitech-l] [MediaWiki-announce] Parsoid Security Update and Release

2016-11-02 Thread Subramanya Sastry
le and as such, this exploit wasn't available for an exploit to steal user sessions. Thanks to the reporter of this exploit, Darian Patrick from the Security Team, Arlo Breault from the Parsing Team, Daniel Zahn and others from Ops for their assistance handling this bug and preparing this re

[Wikitech-l] Parsoid: node 0.1x deprecated now; node 0.1x support will end March 31st, 2017

2016-11-15 Thread Subramanya Sastry
node 0.1x right away from the master branch of Parsoid. Going forward, the Parsoid codebase will adopt ES6 features available in node v4.x and higher which aren't supported in node 0.1x and will constitute a breaking change. Subramanya Sastry (Subbu), Technical Lead and Manager, Parsing

[Wikitech-l] Dev Summit 2017 session: DOM-based semantics for wikitext

2016-11-22 Thread Subramanya Sastry
Hello everyone, I have proposed a 2017 Dev Summit session titled "Improved editability, tooling, reasoning, and performance by adopting DOM-based semantics for wikitext" [1]. The TL:DR; summary is to treat a wikitext article as made up of document fragments that are composed together instead

Re: [Wikitech-l] +2 request for yurik in mediawiki and maps-dev

2017-01-24 Thread Subramanya Sastry
On 01/25/2017 10:04 AM, Brian Wolff wrote: On Tuesday, January 24, 2017, Legoktm wrote: Hi, After speaking with Yurik, I've filed on his behalf to restore his membership in the mediawiki and maps-dev groups. I would appreciate guidance in whether th

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Subramanya Sastry
(Top posting to quickly summarize what I gathered from the discussion and what would be required for Parsoid to expand pages with these transclusions). Parsoid currently relies on the mediawiki API to preprocess transclusions and return wikitext (uses action=expandtemplates for this) which it

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Subramanya Sastry
On 05/17/2014 10:51 AM, Subramanya Sastry wrote: So, going back to your original implementation, here are at least 3 ways I see this working: 2. action=expandtemplates returns a ... for the expansion of {{T}}, but also provides an additional API response header that tells Parsoid that T was

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-17 Thread Subramanya Sastry
On 05/17/2014 06:14 PM, Daniel Kinzler wrote: Am 17.05.2014 17:57, schrieb Subramanya Sastry: On 05/17/2014 10:51 AM, Subramanya Sastry wrote: So, going back to your original implementation, here are at least 3 ways I see this working: 2. action=expandtemplates returns a ... for the expansion

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

2014-05-19 Thread Subramanya Sastry
On 05/19/2014 04:52 AM, Daniel Kinzler wrote: I'm getting the impression there is a fundamental misunderstanding here. You are correct. I completely misunderstood what you said in your last response about expandtemplates. So, the rest of my response to your last email is irrelevant ... and le

Re: [Wikitech-l] Wrapping signatures with a for discoverability

2014-09-30 Thread Subramanya Sastry
My summary based on reading and looking at discussions: Options are: (a) use a regexp to identify sigs -- dirty, but can apply to old discussions (b) new markup //{{#sig:..}} added via PST -- cleaner, but applies only to new discussions, but new wikitext markup (c) don't do it and leave it fo

Re: [Wikitech-l] S Page debuts as Technical Writer

2015-01-05 Thread Subramanya Sastry
On 01/05/2015 04:06 PM, Quim Gil wrote: It is an honor to announce that S Page[1] has moved from the Collaboration (Flow) team to join the WMF Engineering Community team as a Technical Writer[2]. We were really lucky to find such a great combination of English communication skills, awareness of M

Re: [Wikitech-l] Fun with code coverage

2015-01-14 Thread Subramanya Sastry
On 01/14/2015 06:57 PM, James Douglas wrote: Howdy all, Recently we've been playing with tracking our code coverage in Services projects, and so far it's been pretty interesting. Based on your coverage work for restbase, we added code coverage using the same nodejs tools (instanbul) and servi

Re: [Wikitech-l] The future of shared hosting

2015-01-16 Thread Subramanya Sastry
On 01/16/2015 01:49 AM, Stas Malyshev wrote: I think we're trying to fulfill a bit of a contradictory requirement here - running on the same software both the site of the size of *.wikipedia.org and a 1-visit-a-week-maybe $2/month shared hosting install. I think it would be increasingly hard to

Re: [Wikitech-l] Parsoid's progress

2015-01-20 Thread Subramanya Sastry
Some quick comments. As has already been alluded to, Parsoid does a couple different things. * It converts wikitext to html (in such a way that edits to the html can be serialized back to wikitext without introducing dirty diffs in the wikitext). * It converts html to wikitext (in such a way

Re: [Wikitech-l] Thoughts: stateless services with open servers?

2015-01-27 Thread Subramanya Sastry
On 01/27/2015 11:46 AM, C. Scott Ananian wrote: We already run a public Parsoid service. But, this doesn't serve wiki content from wikis other than wikimedia wikis and we are unlikely to do so with the existing production WMF cluster. The discussion is whether we should / will run a differen

Re: [Wikitech-l] Parsoid performance metrics

2015-03-31 Thread Subramanya Sastry
Thanks Christy for your work on the project. Your work in instrumenting Parsoid and providing us with the dashboards is quite useful and will help us keep on top of perf regressions, and identifying things to improve. Subbu. On 03/31/2015 01:04 PM, E.C Okpo wrote: Hello, Parsoid now has da

[Wikitech-l] Wikitext style norms for Parsoid serialization?

2015-04-02 Thread Subramanya Sastry
Greetings! Among other things, Parsoid converts HTML to wikitext. There are two requirements as far as this serialization / conversion goes: * Preserving HTML -> HTML semantics (i.e. a list when serialized to wikitext and parsed back should render as an identical list) * Enforcing certain wi

Re: [Wikitech-l] Wikitext style norms for Parsoid serialization?

2015-04-03 Thread Subramanya Sastry
On 04/02/2015 06:17 PM, Subramanya Sastry wrote: Greetings! Among other things, Parsoid converts HTML to wikitext. There are two requirements as far as this serialization / conversion goes: * Preserving HTML -> HTML semantics (i.e. a list when serialized to wikitext and parsed back sho

Re: [Wikitech-l] Announcement: Matthew Flaschen joins Wikimedia as Features Engineer

2012-12-07 Thread Subramanya Sastry
Welcome Matt. Subbu (work remotely from Minneapolis). Welcome Matt! Philadelphia \o/ On Fri, Dec 7, 2012 at 5:02 PM, James Forrester wrote: On 7 December 2012 13:06, Terry Chay wrote: Please join me in a belated welcome of Matthew Flaschen to the Wikimedia Foundation. :-) Welcome ab

Re: [Wikitech-l] Alpha version of the VisualEditor now available on the English Wikipedia

2012-12-12 Thread Subramanya Sastry
On Wed, Dec 12, 2012 at 8:59 AM, Chad wrote: On Wed, Dec 12, 2012 at 10:58 AM, Chris McMahon wrote: Would it be possible to enable VE on test2 in the same way? I would like to use it in a noisy way, and would rather not make noise on enwiki. It's also enabled for the user namespace, so p

Re: [Wikitech-l] gerrit support question: how to show raw file content after the commit in the browser, not as zip download

2012-12-12 Thread Subramanya Sastry
On Tue, Dec 11, 2012 at 10:47 AM, Ryan Kaldari wrote: This is actually my biggest annoyance with gerrit—that I can't view raw code from the change view. I can't fathom why they have a zip download link, but not a view link. Then I could copy code without copying all the line numbers. +1 this

Re: [Wikitech-l] Welcome, Munagala Ramanath (Ram)

2013-01-15 Thread Subramanya Sastry
Ram also has a PhD in Computer Science from University of Wisconsin, Madison. Welcome Ram, and small world ... I got my PhD in CS from UW, Madison as well! Have to share notes about UW, and Madison sometime. :) Subbu. ___ Wikitech-l mailing lis

Re: [Wikitech-l] RFC: Parsoid roadmap

2013-01-30 Thread Subramanya Sastry
On 01/30/2013 12:36 AM, Ariel T. Glenn wrote: Στις 23-01-2013, ημέρα Τετ, και ώρα 15:10 -0800, ο/η Gabriel Wicke έγραψε: Fellow MediaWiki hackers! After the pretty successful December release and some more clean-up work following up on that we are now considering the next steps for Parsoid. To

Re: [Wikitech-l] Deploying alpha of VisualEditor to non-English Wikipedias

2013-04-29 Thread Subramanya Sastry
On 04/29/2013 03:34 PM, David Gerard wrote: On 27 April 2013 09:25, David Gerard wrote: ... That doesn't quite address the issue: surely it's a problem if it's claiming to take feedback but the feedback is going nowhere. (I've been trying it and flagging seriously buggy behaviour in said commen

Re: [Wikitech-l] Remove 'visualeditor-enable' from $wgHiddenPrefs

2013-07-22 Thread Subramanya Sastry
On 07/22/2013 10:44 PM, Tim Starling wrote: Round-trip bugs, and bugs which cause a given wikitext input to give different HTML in Parsoid compared to MW, should have been detected during automated testing, prior to beta deployment. I don't know why we need users to report them. 500+ edits are

Re: [Wikitech-l] Remove 'visualeditor-enable' from $wgHiddenPrefs

2013-07-23 Thread Subramanya Sastry
so they can be fixed where they need to be. On 07/23/2013 02:50 AM, John Vandenberg wrote: On Tue, Jul 23, 2013 at 4:32 PM, Subramanya Sastry wrote: On 07/22/2013 10:44 PM, Tim Starling wrote: Round-trip bugs, and bugs which cause a given wikitext input to give different HTML in Parsoid co

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry
On 07/23/2013 05:28 PM, John Vandenberg wrote: On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry wrote: Hi John and Risker, First off, I do want to once again clarify that my intention in the previous post was not to claim that VE/Parsoid is perfect. It was more that we've fixed suffi

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry
On 07/23/2013 06:13 PM, John Vandenberg wrote: On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry wrote: On 07/23/2013 05:28 PM, John Vandenberg wrote: On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry wrote: Hi John and Risker, First off, I do want to once again clarify that my intention

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry
On 07/23/2013 06:02 PM, Subramanya Sastry wrote: On 07/23/2013 05:28 PM, John Vandenberg wrote: VE and Parsoid devs have put in a lot and lot of effort to recognize broken wikitext source, fix it or isolate it, My point was that you dont appear to be doing analysis of how of all Wikipedia

Re: [Wikitech-l] dirty diffs and VE

2013-07-23 Thread Subramanya Sastry
On 07/23/2013 06:55 PM, John Vandenberg wrote: On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry wrote: http://parsoid.wmflabs.org:8001/stats This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias). Very minor point .. there are ~400 missing pages on the list

Re: [Wikitech-l] dirty diffs and VE

2013-07-24 Thread Subramanya Sastry
On 07/24/2013 09:58 AM, Roan Kattouw wrote: There are a few things I wish it tested, but they're mostly about how it tests things rather than what data is collected. For instance, it would be nice if the round-trip tests could round-trip from wikitext to HTML *string* and back, rather than to H

Re: [Wikitech-l] dirty diffs and VE

2013-07-25 Thread Subramanya Sastry
On 07/25/2013 01:03 PM, Roan Kattouw wrote: On Wed, Jul 24, 2013 at 2:49 PM, C. Scott Ananian wrote: For what it's worth, both the DOM serialization-to-a-string and DOM parsing-from-a-string are done with the domino package. It has a substantial test suite of its own (originally from http://ww

Re: [Wikitech-l] Operations team announcements: Ryan and Leslie

2013-10-16 Thread Subramanya Sastry
Congrats Ryan and Leslie. -Subbu. On 10/16/2013 03:40 AM, Ken Snider wrote: Hello! I'm extremely pleased to announce that we've had two promotions within the Technical Operations team! Congratulations to both of you! Matt Flaschen ___ Wikitech-l

Re: [Wikitech-l] RFC cluster summary: HTML templating

2013-12-27 Thread Subramanya Sastry
While not directly related, Be Birchall's (Be is cced on this email) OPW project might be of interest to this discussion. In Parsoid, we are currently researching DOM-based templating solutions for a bunch of reasons, including experimenting with use as a wikitext templating engine (let us not

Re: [Wikitech-l] Parsoid: node 0.1x deprecated now; node 0.1x support will end March 31st, 2017

2017-04-07 Thread Subramanya Sastry
, Subramanya Sastry wrote: The Parsing team at the Wikimedia Foundation that develops the Parsoid service is deprecating support for node 0.1x. Parsoid is the service that powers VisualEditor, Content Translation, and Flow. If you don't run a MediaWiki install that uses VisualEditor, then

Re: [Wikitech-l] Setting up multiple Parsoid servers behind load balancer

2017-06-09 Thread Subramanya Sastry
On 06/09/2017 09:57 AM, Gabriel Wicke wrote: On Fri, Jun 9, 2017 at 12:56 AM, Alexandros Kosiaris < akosia...@wikimedia.org> wrote: I also don't think you need RESTBase as long as you are willing to wait for parsoid to finish parsing and returning the result. Apart from performance, there is a

[Wikitech-l] @author annotations in files in the mediawiki codebase

2017-06-12 Thread Subramanya Sastry
I noticed that core files have @author annotations. My take on this is as follows: Any active codebase (such as mediawiki) sees constant change and code is refactored, rewritten, renamed, files moved around, split up, etc. that the only meaningful interperation of "@author" would be someone wh

Re: [Wikitech-l] @author annotations in files in the mediawiki codebase

2017-06-13 Thread Subramanya Sastry
Whether moving CREDITS to a centralised file actually solves the problem, rather than just shifting it around, is debatable. I think a centralized credits / contributors file is far better since it recognizes contributions even if those changes have since been all edited / rewritten / deleted.

Re: [Wikitech-l] @author annotations in files in the mediawiki codebase

2017-06-13 Thread Subramanya Sastry
On 06/13/2017 10:14 AM, C. Scott Ananian wrote: On Jun 13, 2017 6:24 AM, "Gergo Tisza" wrote: On Tue, Jun 13, 2017 at 7:11 AM, Subramanya Sastry wrote: I find these annotations misleading and wonder why they exist and what purpose they serve. > It can sometimes tell you wh

Re: [Wikitech-l] @author annotations in files in the mediawiki codebase

2017-06-13 Thread Subramanya Sastry
On 06/13/2017 03:33 AM, Antoine Musso wrote: Jon Robson opened a task about it a year or so ago: "Remove @author lines from code" https://phabricator.wikimedia.org/T139301 Aha .. thanks! I was looking at a 2016 code review y'day and noticed my comment about @author there ( https://ger

[Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-06 Thread Subramanya Sastry
How to read this post? -- * For those without time to read lengthy technical emails, read the TL;DR section. * For those who don't care about all the details but want to help with this project, you can read sections 1 and 2 about Tidy, and then skip to section 7. * For th

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-06 Thread Subramanya Sastry
On 07/06/2017 09:59 AM, Pine W wrote: Thanks for the information. I understand that moving from HTML 4 to HTML 5 is probably a good idea. It is a good and necessary step. We want MediaWiki (and wikipedia) output to keep up with web standards. However, I am concerned about this statement:

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-06 Thread Subramanya Sastry
On 07/06/2017 05:09 PM, Nicolas Vervelle wrote: Since the start of the Linter project (when we started off with the GSoC prototype in summer of 2014, and once again when Kunal picked it up in 2016), we have been in conversation with Nico V (frwiki and who maintains WPCleaner) and with Marios Mag

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-07 Thread Subramanya Sastry
On 07/07/2017 04:05 PM, Chad wrote: On Thu, Jul 6, 2017 at 5:02 AM Subramanya Sastry wrote: 6. Tools to assist editors: Linter & ParserMigration In October 2016, at the parsing team offsite, Kunal ([[User:Legoktm (WMF)]]) dusted off

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-07 Thread Subramanya Sastry
- On the Full Analysis window, the second button with a globe and a broom (Subbu, would you have a recommended icon for Linter related stuff ?) I will have to get back to you on this. I'll have to get some help from someone who can design / recommend something appropriate here. I a

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-09 Thread Subramanya Sastry
Thanks Pine! One other related comment that perhaps I should have made earlier and that is relevant based on your broader question around efforts of editors, bots, where to spend time fixing things. https://www.mediawiki.org/wiki/Help:Extension:Linter#Why_and_what_to_fix tries to clarify how

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-11 Thread Subramanya Sastry
On 07/11/2017 05:13 AM, Nicolas Vervelle wrote: But I have a few questions / suggestions regarding Linter for the moment: - Is is possible to retrieve also the localized names of the Linter categories and priorities: for example, on frwiki, you can see on the Linter page [1] that th

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-12 Thread Subramanya Sastry
On 07/12/2017 01:12 AM, Nicolas Vervelle wrote: Hi Subbu, Using the localized names, I've found that not all Linter categories are listed in the API result. Is it normal ? For example, on frwiki, Linter reports 3 "mixed-content" errors for "Les Trolls (film)" but this category is not in the API

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-13 Thread Subramanya Sastry
On 07/13/2017 02:18 AM, Nicolas Vervelle wrote: I think I've found some discrepancy between Linter reports. On frwiki, the page "Discussion:Yasser Arafat" is reported in the list for self-closed-tag [1], but when run the text of the page through the transform API [2], I only get errors for obso

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-17 Thread Subramanya Sastry
an discuss it there. Subbu. On 07/17/2017 04:10 AM, Nicolas Vervelle wrote: On Thu, Jul 13, 2017 at 9:18 AM, Nicolas Vervelle wrote: On Tue, Jul 11, 2017 at 5:05 PM, Subramanya Sastry wrote: On 07/11/2017 05:13 AM, Nicolas Vervelle wrote: - In the page dedicated to a category,

Re: [Wikitech-l] MW Function for parsing and modifying template arguments on a page

2017-08-09 Thread Subramanya Sastry
Take a look at Parsoid's output spec and the Parsoid API (as exposed through the REST API). See https://www.mediawiki.org/wiki/Specs/HTML/1.4.0#Template_markup and https://en.wikipedia.org/api/rest_v1/#!/Transforms/post_transform_html_to_wikitext_title_revision So, you fetch the HTML, edit da

[Wikitech-l] Followup (Re: Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018)

2017-08-10 Thread Subramanya Sastry
On 07/06/2017 08:02 AM, Subramanya Sastry wrote: TL;DR - The Parsing team wants to replace Tidy with a RemexHTML-based solution on the Wikimedia cluster by June 2018. This will require editors to fix pages and templates to address wikitext patterns that behave differently with RemexHTML

Re: [Wikitech-l] Followup (Re: Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018)

2017-08-10 Thread Subramanya Sastry
On 08/10/2017 02:49 PM, יגאל חיטרון wrote: Hello and thank you for this. Is there a phab ticket to follow the deployment process? Igal (User:IKhitron) We have the original Tidy replacement ticket (https://phabricator.wikimedia.org/T89331) but, as we get closer to start making phased deployment

Re: [Wikitech-l] Followup (Re: Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018)

2017-08-11 Thread Subramanya Sastry
17-08-10 22:06 GMT+03:00 Subramanya Sastry : On 08/10/2017 02:49 PM, יגאל חיטרון wrote: Hello and thank you for this. Is there a phab ticket to follow the deployment process? Igal (User:IKhitron) We have the original Tidy replacement ticket ( https://phabricator.wikimedia.org/T89331) but, as w

Re: [Wikitech-l] HHVM vs. Zend divergence

2017-09-19 Thread Subramanya Sastry
On 09/19/2017 10:34 AM, Bryan Davis wrote: For what it's worth, my opinion is that PHP is an actual FLOSS software project with years of history and core contributions from Zend who make their living with PHP. HHVM is a well funded internal project from Facebook that has experimented with FLOSS

Re: [Wikitech-l] First step for MCR merged: Deprecate and gut the Revision class

2017-12-22 Thread Subramanya Sastry
+1 to what Chad said reg deploy and what Toby, Chad & Scott said with their kudos and appreciation :)  -Subbu. On 12/22/2017 11:00 AM, C. Scott Ananian wrote: Having just read through T183252, I feel Roan deserves a big hand for his "I take a walk and become Sherlock Holmes" detective work and

[Wikitech-l] Replacing Tidy on Wikimedia wikis, second wave

2018-01-22 Thread Subramanya Sastry
TL;DR - On January 31, 2018, on ru.wp, sv.wp, fi.wp and he.wp, we are going to turn off Tidy and switch to the Remex HTML5 parsing library. Besides those, another 200+ wikis will also be switched away from Tidy on that day. You can find the list of such wikis at T184656 [1]. Do any of you be

[Wikitech-l] node v4 support in Parsoid deprecated

2018-02-20 Thread Subramanya Sastry
As of Feb 20, 2018, the developers of Parsoid are deprecating support for node v4. Parsoid is the service that powers VisualEditor, Content Translation, Structured Discussions (formerly Flow), and other MediaWiki features. If you don't run a MediaWiki install that uses VisualEditor or these o

[Wikitech-l] Replacing Tidy on Wikimedia wikis: Next round

2018-03-05 Thread Subramanya Sastry
Hello everyone, On behalf of the Parsing Team @ the WMF, I am announcing our plans to replace Tidy with Remex on the next set of wikis. On March 13th, we plan to turn off Tidy on about 100 wikis that have fewer than 25 issues in all high-priority linter categories [1]. On March 14th, we pla

[Wikitech-l] Potential breaking change for VisualEditor installations in upcoming Parsoid release (v0.9.0)

2018-03-23 Thread Subramanya Sastry
Hello everyone, We are releasing the next version of the Parsoid deb and npm packages (v0.9.0) later today. There is one significant change in this release that might affect some VisualEditor installations. This version of Parsoid wraps sections in tags and bumps the HTML version to 1.6.1. H

[Wikitech-l] Replacing Tidy on Wikimedia wikis: Next round

2018-03-26 Thread Subramanya Sastry
Hello everyone, On behalf of the Parsing Team @ the WMF, I am announcing our plans to replace Tidy with Remex on the next set of wikis. On April 4th, we plan to turn off Tidy on all wikiquotes (except frwikiquote) [1] and wikimedia wikis [2]. 23 wikis will have Tidy replaced. On April 11th,

  1   2   >