[Wikitech-l] Re: [Wikitext-l] Parsoid template transclusion behavior

2024-02-18 Thread Subramanya Sastry
[ Resending since I forgot to copy all lists -- please don't mind the duplicate response on wikitext-l. ] Our primary goal with Parsoid today is to ensure maximum compatibility with the current default parser -- without that, it would be impossible to switch over to Parsoid for all page

[Wikitech-l] Re: Comments like Google Docs

2023-12-10 Thread Subramanya Sastry
On 12/8/23 03:12, Felipe Schenone wrote: Hi! I'm thinking on writing a gadget to add inline comments to articles, similar to how Google Docs comments work. However, I'm sure I recently read somewhere about someone developing an extension or something with the same goal, but now I can't find

[Wikitech-l] Re: Changes in Parsoid redlink output (breaking backwards compatibility)

2022-11-30 Thread Subramanya Sastry
Thanks Kunal for the feedback. Isabelle is out today, so I'll chat with the team and figure out next steps. We realized this only after the train branch was cut and DiscussionTools tests failed. Part of the reason we didn't hold off the deployment is that we figured that this is an edge case

[Wikitech-l] ParserTests updates

2022-10-13 Thread Subramanya Sastry
Hello everyone, The Content Transform Team finally decided to act on this old task to split up the parserTests.txt file in core into a number of smaller files by functionality. We got a first

[Wikitech-l] Re: Parsertests and Parsoid

2022-05-09 Thread Subramanya Sastry
TLDR: Thanks for the nudge. Will do. Currently, PHPUnit in core can run wt2html and wt2wt tests against Parsoid. There is a patch chain in gerrit ( starting at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/776283 ) that upgrades that support in MediaWiki core for running parser tests in

[Wikitech-l] Parsoid/JS is EOL

2021-09-30 Thread Subramanya Sastry
Hello everyone, MediaWiki 1.31 has reached EOL. Consequently, Parsoid/JS (the javascript version of Parsoid) is also EOL starting today. Parsoid was originally implemented in Javascript. It was ported to PHP in 2019 and the PHP port is now part of all MediaWiki releases. With the EOL of the

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-09 Thread Subramanya Sastry
Parsoid has a linter extension which is well suited for something like this and was effectively developed with something like this in mind. It is currently enabled on *all* parses, but in the future, depending on how expensives lints

Re: [Wikitech-l] TechCom weekly digest 2020-08-26

2020-08-28 Thread Subramanya Sastry
On 8/28/20 6:19 AM, Daniel Kinzler wrote: In an effort to encourage wider and less formal participation through Wikitech-l (and to make our process more asynchronous) we'll also write to Wikitech-l as part of the board triage going forward. Thank you! :) == RFC: New API for Parsoid

[Wikitech-l] Last Parsoid/JS debian package released

2020-01-15 Thread Subramanya Sastry
Hello all, As you might have gleaned from previous posts to the list about Parsoid, Parsoid has now been ported from JS to PHP. As a result, we have stopped all further updates to the JS codebase. We have released the last version of the Parsoid/JS debian package, v0.11.0 [1]. This includes

Re: [Wikitech-l] Timelines for Parsoid/PHP to replace legacy PHP parser

2020-01-12 Thread Subramanya Sastry
On 1/12/20 5:33 PM, Lord_Farin wrote: Hi Wikitech, I've been catching up on the recent achievements regarding Parsoid/PHP, well done! Thanks! The switchover of wikitext engines is going to take some time. I would be surprised if we got all the ducks lined up before 18 months from now --

Re: [Wikitech-l] Parsoid/PHP replaces Parsoid/JS on the Wikimedia cluster

2019-12-19 Thread Subramanya Sastry
We haven't updated the setup instructions yet because we are going to be making some more changes to how Parsoid/PHP integrates with MediaWiki. We hope to be able to get it all squared away in time for the next MW release. That will make the installation as simple as upgrading to the latest

[Wikitech-l] Parsoid/PHP replaces Parsoid/JS on the Wikimedia cluster

2019-12-19 Thread Subramanya Sastry
Summary --- Parsoid/PHP, the PHP port of Parsoid is now live everywhere for all products on all wikis. Parsoid/JS is still deployed on the Wikimedia cluster but doesn't receive any traffic and will be decommissioned in the new year. Context: Making Parsoid the default MediaWiki wikitext

[Wikitech-l] Wikitext linting database updates temporarily turned off ( T240057 )

2019-12-06 Thread Subramanya Sastry
Hello everyone, On Monday this week, we enabled Parsoid/PHP everywhere for all products (Visual Editor, Mobile Content Service, Structured Discussions aka Flow, Content Translation) but two (Wikitext Linter, Language Variant support). Those last two were being supported by Parsoid/JS. However,

[Wikitech-l] Parsoid/PHP deployment update

2019-11-26 Thread Subramanya Sastry
Hi all, Further to my email from Nov 19 [1], here is the next update about Parsoid/PHP deployment on the Wikimedia cluster. As of today, * Parsoid/PHP is serving Visual Editor (VE), Content Translation (CX), and Mobile Content Service (MCS, used by the Android app) on group 0 and group 1

[Wikitech-l] Parsoid/PHP enabled on test and test2 wikis and the beta cluster

2019-11-19 Thread Subramanya Sastry
Hello everyone, Some of might know that the parsing team has been porting Parsoid from Javascript to PHP this year [1]. Over the last couple months, we have done intensive testing in various modes (parser tests, round trip testing, HTML string diff testing) to verify correctness of

Re: [Wikitech-l] leading space and tag

2019-07-22 Thread Subramanya Sastry
On 7/22/19 11:05 AM, Subramanya Sastry wrote: On 7/22/19 10:51 AM, Arlo Breault wrote: On Jul 22, 2019, at 5:11 AM, Sergey F wrote: test2   test3 The result of conversion is: test2 test3 Yes, this looks like a bug See https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/524811

Re: [Wikitech-l] leading space and tag

2019-07-22 Thread Subramanya Sastry
On 7/22/19 10:51 AM, Arlo Breault wrote: On Jul 22, 2019, at 5:11 AM, Sergey F wrote: test2 test3 The result of conversion is: test2 test3 Yes, this looks like a bug See https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/524811 Thanks Thanks Arlo! Sergey: It is possible

Re: [Wikitech-l] [Tech Talks] July 10, 2019, 4 PM UTC, A Deployment Pipeline Overview

2019-07-10 Thread Subramanya Sastry
Reminder: this talk is in 45 mins.  -Subbu. On 7/8/19 12:15 PM, Subramanya Sastry wrote: Reminder: this talk is in 2 days.  -Subbu. On 6/26/19 9:38 AM, Subramanya Sastry wrote: Hi Everyone, It's time for Wikimedia Tech Talks 2019 Episode 6! Next month's talk will take place July 10, 2019

Re: [Wikitech-l] [Tech Talks] July 10, 2019, 4 PM UTC, A Deployment Pipeline Overview

2019-07-08 Thread Subramanya Sastry
Reminder: this talk is in 2 days.  -Subbu. On 6/26/19 9:38 AM, Subramanya Sastry wrote: Hi Everyone, It's time for Wikimedia Tech Talks 2019 Episode 6! Next month's talk will take place July 10, 2019 at 4:00 PM UTC. Topic: A Deployment Pipeline Overview Speaker: Alexandros Kosiaris, Senior

Re: [Wikitech-l] Patchsets by new Gerrit contributors waiting for code review and/or merge

2019-06-26 Thread Subramanya Sastry
On 6/20/19 2:08 AM, Andre Klapper wrote: Summary of worst repos with more than one patch waiting: 5x WikiPEG. 4x Parsoid. Given our current narrow focus on porting Parsoid from JS to PHP, this situation will continue for maybe another 2 months since we are not merging code on the JS side

[Wikitech-l] [Tech Talks] July 10, 2019, 4 PM UTC, A Deployment Pipeline Overview

2019-06-26 Thread Subramanya Sastry
Hi Everyone, It's time for Wikimedia Tech Talks 2019 Episode 6! Next month's talk will take place July 10, 2019 at 4:00 PM UTC. Topic: A Deployment Pipeline Overview Speaker: Alexandros Kosiaris, Senior Operations Engineer Summary: The deployment pipeline project has been ongoing for a while,

Re: [Wikitech-l] [Tech Talks] June 25, 2019, 6 PM UTC, Just what is Analytics doing back there?

2019-06-25 Thread Subramanya Sastry
Reminder: Talk starts in 5 mins.  -Subbu. On 6/24/19 9:31 AM, Subramanya Sastry wrote: Reminder: This talk is tomorrow. -Subbu. On 6/10/19 10:52 AM, Subramanya Sastry wrote: Hi Everyone, It's time for Wikimedia Tech Talks 2019 Episode 5! This month's talk will take place *June 25, 2019 at 6

Re: [Wikitech-l] [Tech Talks] June 25, 2019, 6 PM UTC, Just what is Analytics doing back there?

2019-06-24 Thread Subramanya Sastry
Reminder: This talk is tomorrow. -Subbu. On 6/10/19 10:52 AM, Subramanya Sastry wrote: Hi Everyone, It's time for Wikimedia Tech Talks 2019 Episode 5! This month's talk will take place *June 25, 2019 at 6:00 PM UTC.* *Topic*: Just what is Analytics doing back there? *Speaker*: Dan

[Wikitech-l] [Tech Talks] June 25, 2019, 6 PM UTC, Just what is Analytics doing back there?

2019-06-10 Thread Subramanya Sastry
Hi Everyone, It's time for Wikimedia Tech Talks 2019 Episode 5! This month's talk will take place *June 25, 2019 at 6:00 PM UTC.* *Topic*: Just what is Analytics doing back there? *Speaker*: Dan Andreescu, Senior Software Engineer, Analytics *Summary*: We take care of twelve systems. Data

Re: [Wikitech-l] Scrum of scrums/2019-05-22

2019-05-23 Thread Subramanya Sastry
I have added a Parsing update @ https://www.mediawiki.org/wiki/Scrum_of_scrums/2019-05-22#Parsing -Subbu. On 5/22/19 11:50 AM, Željko Filipin wrote: Hi, I would like to highlight a few notes from SoS Meeting Bookkeeping: * We're still looking for a backup facilitator.  * We're still looking

Re: [Wikitech-l] Scrum of scrums/2019-04-10

2019-04-11 Thread Subramanya Sastry
See updates from the parsing team below. On 4/10/19 11:28 AM, Željko Filipin wrote: Hi, for HTML version see https://www.mediawiki.org/wiki/Scrum_of_scrums/2019-04-10 Updated this as well. Parsing * Blocked by: * Blocking: * Updates: Parsoid/PHP port ongoing: * Phan now set

Re: [Wikitech-l] Thank you Tuesday

2019-02-19 Thread Subramanya Sastry
On 2/18/19 7:11 PM, MusikAnimal wrote: ... Your prompt, tireless efforts to restore stability over the weekend has not gone unnoticed. Y'all are amazing. +1. Subbu. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org

[Wikitech-l] Content Negotiation Protocol for Parsoid HTML in the REST API

2018-11-14 Thread Subramanya Sastry
Hello everyone, The Core Platform and Parsing teams at the Wikimedia Foundation are glad to announce the implementation of a content negotiation protocol for Parsoid HTML in the REST API [1]. This was deployed to the Wikimedia cluster on October 1, 2018. TL;DR - Parsoid HTML clients can now

Re: [Wikitech-l] Improvements to the wikitext diff view

2018-08-23 Thread Subramanya Sastry
This is good work and a nice feature. Thanks! Subbu. On 08/23/2018 08:14 AM, Michael Schönitzer wrote: Hi, As you may know, the team Technical Wishes of Wikimedia Deutschland has worked on improving the wikitext diff view for over a year in order to fix a technical limitation: Whenever a

[Wikitech-l] Tidy removed on all Wikimedia wikis (final update)

2018-07-11 Thread Subramanya Sastry
Hello everyone, On 6th July 2017, we made an announcement [1] about our plans to replace Tidy with RemexHtml on the Wikimedia cluster. On 5th July 2018, we made the final switchover from Tidy [2]. For those of you interested, we published a blog post [3] documenting the process and steps in

[Wikitech-l] Tidy replacement on final Wikimedia wikis on 5 July 2018

2018-06-14 Thread Subramanya Sastry
Hello everyone, That time is finally upon us. [1] On 6 July 2017, we made an announcement [2] about our plans to replace Tidy with RemexHtml on the Wikimedia cluster. Over the last year, we have progressively replaced Tidy on about 800 wikis in a phased manner. [3] A year later, as announced

[Wikitech-l] Fwd: Replacing Tidy on large wikis (by end of June 2018)

2018-04-20 Thread Subramanya Sastry
[ Crossposting my wikitech-ambassadors post from y'day for those you active on different wikis. ] Hello everyone, TL:DR; -- As you are aware from previous postings on this list [1] [2] [3] [4] [5] [6], we have been progressively replacing Tidy with RemexHtml on all wikis on the wikimedia

[Wikitech-l] Replacing Tidy on Wikimedia wikis: Next round

2018-03-26 Thread Subramanya Sastry
Hello everyone, On behalf of the Parsing Team @ the WMF, I am announcing our plans to replace Tidy with Remex on the next set of wikis. On April 4th, we plan to turn off Tidy on all wikiquotes (except frwikiquote) [1] and wikimedia wikis [2]. 23 wikis will have Tidy replaced. On April

[Wikitech-l] Potential breaking change for VisualEditor installations in upcoming Parsoid release (v0.9.0)

2018-03-23 Thread Subramanya Sastry
Hello everyone, We are releasing the next version of the Parsoid deb and npm packages (v0.9.0) later today. There is one significant change in this release that might affect some VisualEditor installations. This version of Parsoid wraps sections in tags and bumps the HTML version to 1.6.1.

[Wikitech-l] Replacing Tidy on Wikimedia wikis: Next round

2018-03-05 Thread Subramanya Sastry
Hello everyone, On behalf of the Parsing Team @ the WMF, I am announcing our plans to replace Tidy with Remex on the next set of wikis. On March 13th, we plan to turn off Tidy on about 100 wikis that have fewer than 25 issues in all high-priority linter categories [1]. On March 14th, we

[Wikitech-l] node v4 support in Parsoid deprecated

2018-02-20 Thread Subramanya Sastry
As of Feb 20, 2018, the developers of Parsoid are deprecating support for node v4. Parsoid is the service that powers VisualEditor, Content Translation, Structured Discussions (formerly Flow), and other MediaWiki features. If you don't run a MediaWiki install that uses VisualEditor or these

[Wikitech-l] Replacing Tidy on Wikimedia wikis, second wave

2018-01-22 Thread Subramanya Sastry
TL;DR - On January 31, 2018, on ru.wp, sv.wp, fi.wp and he.wp, we are going to turn off Tidy and switch to the Remex HTML5 parsing library. Besides those, another 200+ wikis will also be switched away from Tidy on that day. You can find the list of such wikis at T184656 [1]. Do any of you

Re: [Wikitech-l] First step for MCR merged: Deprecate and gut the Revision class

2017-12-22 Thread Subramanya Sastry
+1 to what Chad said reg deploy and what Toby, Chad & Scott said with their kudos and appreciation :)  -Subbu. On 12/22/2017 11:00 AM, C. Scott Ananian wrote: Having just read through T183252, I feel Roan deserves a big hand for his "I take a walk and become Sherlock Holmes" detective work

Re: [Wikitech-l] HHVM vs. Zend divergence

2017-09-19 Thread Subramanya Sastry
On 09/19/2017 10:34 AM, Bryan Davis wrote: For what it's worth, my opinion is that PHP is an actual FLOSS software project with years of history and core contributions from Zend who make their living with PHP. HHVM is a well funded internal project from Facebook that has experimented with FLOSS

Re: [Wikitech-l] Followup (Re: Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018)

2017-08-11 Thread Subramanya Sastry
-10 22:06 GMT+03:00 Subramanya Sastry <ssas...@wikimedia.org>: On 08/10/2017 02:49 PM, יגאל חיטרון wrote: Hello and thank you for this. Is there a phab ticket to follow the deployment process? Igal (User:IKhitron) We have the original Tidy replacement ticket ( https://phabricator.wikimed

Re: [Wikitech-l] Followup (Re: Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018)

2017-08-10 Thread Subramanya Sastry
On 08/10/2017 02:49 PM, יגאל חיטרון wrote: Hello and thank you for this. Is there a phab ticket to follow the deployment process? Igal (User:IKhitron) We have the original Tidy replacement ticket (https://phabricator.wikimedia.org/T89331) but, as we get closer to start making phased

[Wikitech-l] Followup (Re: Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018)

2017-08-10 Thread Subramanya Sastry
On 07/06/2017 08:02 AM, Subramanya Sastry wrote: TL;DR - The Parsing team wants to replace Tidy with a RemexHTML-based solution on the Wikimedia cluster by June 2018. This will require editors to fix pages and templates to address wikitext patterns that behave differently with RemexHTML

Re: [Wikitech-l] MW Function for parsing and modifying template arguments on a page

2017-08-09 Thread Subramanya Sastry
Take a look at Parsoid's output spec and the Parsoid API (as exposed through the REST API). See https://www.mediawiki.org/wiki/Specs/HTML/1.4.0#Template_markup and https://en.wikipedia.org/api/rest_v1/#!/Transforms/post_transform_html_to_wikitext_title_revision So, you fetch the HTML, edit

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-17 Thread Subramanya Sastry
it there. Subbu. On 07/17/2017 04:10 AM, Nicolas Vervelle wrote: On Thu, Jul 13, 2017 at 9:18 AM, Nicolas Vervelle <nverve...@gmail.com> wrote: On Tue, Jul 11, 2017 at 5:05 PM, Subramanya Sastry <ssas...@wikimedia.org> wrote: On 07/11/2017 05:13 AM, Nicolas Ve

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-13 Thread Subramanya Sastry
On 07/13/2017 02:18 AM, Nicolas Vervelle wrote: I think I've found some discrepancy between Linter reports. On frwiki, the page "Discussion:Yasser Arafat" is reported in the list for self-closed-tag [1], but when run the text of the page through the transform API [2], I only get errors for

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-12 Thread Subramanya Sastry
On 07/12/2017 01:12 AM, Nicolas Vervelle wrote: Hi Subbu, Using the localized names, I've found that not all Linter categories are listed in the API result. Is it normal ? For example, on frwiki, Linter reports 3 "mixed-content" errors for "Les Trolls (film)" but this category is not in the

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-11 Thread Subramanya Sastry
On 07/11/2017 05:13 AM, Nicolas Vervelle wrote: But I have a few questions / suggestions regarding Linter for the moment: - Is is possible to retrieve also the localized names of the Linter categories and priorities: for example, on frwiki, you can see on the Linter page [1] that

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-09 Thread Subramanya Sastry
Thanks Pine! One other related comment that perhaps I should have made earlier and that is relevant based on your broader question around efforts of editors, bots, where to spend time fixing things. https://www.mediawiki.org/wiki/Help:Extension:Linter#Why_and_what_to_fix tries to clarify

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-07 Thread Subramanya Sastry
- On the Full Analysis window, the second button with a globe and a broom (Subbu, would you have a recommended icon for Linter related stuff ?) I will have to get back to you on this. I'll have to get some help from someone who can design / recommend something appropriate here. I

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-07 Thread Subramanya Sastry
On 07/07/2017 04:05 PM, Chad wrote: On Thu, Jul 6, 2017 at 5:02 AM Subramanya Sastry <ssas...@wikimedia.org> wrote: 6. Tools to assist editors: Linter & ParserMigration In October 2016, at the parsing team offsite, Kunal ([[Us

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-06 Thread Subramanya Sastry
On 07/06/2017 05:09 PM, Nicolas Vervelle wrote: Since the start of the Linter project (when we started off with the GSoC prototype in summer of 2014, and once again when Kunal picked it up in 2016), we have been in conversation with Nico V (frwiki and who maintains WPCleaner) and with Marios

Re: [Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-06 Thread Subramanya Sastry
On 07/06/2017 09:59 AM, Pine W wrote: Thanks for the information. I understand that moving from HTML 4 to HTML 5 is probably a good idea. It is a good and necessary step. We want MediaWiki (and wikipedia) output to keep up with web standards. However, I am concerned about this statement:

[Wikitech-l] Tidy will be replaced by RemexHTML on Wikimedia wikis latest by June 2018

2017-07-06 Thread Subramanya Sastry
How to read this post? -- * For those without time to read lengthy technical emails, read the TL;DR section. * For those who don't care about all the details but want to help with this project, you can read sections 1 and 2 about Tidy, and then skip to section 7. * For

Re: [Wikitech-l] @author annotations in files in the mediawiki codebase

2017-06-13 Thread Subramanya Sastry
On 06/13/2017 03:33 AM, Antoine Musso wrote: Jon Robson opened a task about it a year or so ago: "Remove @author lines from code" https://phabricator.wikimedia.org/T139301 Aha .. thanks! I was looking at a 2016 code review y'day and noticed my comment about @author there (

Re: [Wikitech-l] @author annotations in files in the mediawiki codebase

2017-06-13 Thread Subramanya Sastry
On 06/13/2017 10:14 AM, C. Scott Ananian wrote: On Jun 13, 2017 6:24 AM, "Gergo Tisza" <gti...@wikimedia.org> wrote: On Tue, Jun 13, 2017 at 7:11 AM, Subramanya Sastry <ssas...@wikimedia.org> wrote: I find these annotations misleading and wonder why they exist and w

Re: [Wikitech-l] @author annotations in files in the mediawiki codebase

2017-06-13 Thread Subramanya Sastry
Whether moving CREDITS to a centralised file actually solves the problem, rather than just shifting it around, is debatable. I think a centralized credits / contributors file is far better since it recognizes contributions even if those changes have since been all edited / rewritten /

[Wikitech-l] @author annotations in files in the mediawiki codebase

2017-06-12 Thread Subramanya Sastry
I noticed that core files have @author annotations. My take on this is as follows: Any active codebase (such as mediawiki) sees constant change and code is refactored, rewritten, renamed, files moved around, split up, etc. that the only meaningful interperation of "@author" would be someone

Re: [Wikitech-l] Setting up multiple Parsoid servers behind load balancer

2017-06-09 Thread Subramanya Sastry
On 06/09/2017 09:57 AM, Gabriel Wicke wrote: On Fri, Jun 9, 2017 at 12:56 AM, Alexandros Kosiaris < akosia...@wikimedia.org> wrote: I also don't think you need RESTBase as long as you are willing to wait for parsoid to finish parsing and returning the result. Apart from performance, there is

Re: [Wikitech-l] Parsoid: node 0.1x deprecated now; node 0.1x support will end March 31st, 2017

2017-04-07 Thread Subramanya Sastry
, Subramanya Sastry wrote: The Parsing team at the Wikimedia Foundation that develops the Parsoid service is deprecating support for node 0.1x. Parsoid is the service that powers VisualEditor, Content Translation, and Flow. If you don't run a MediaWiki install that uses VisualEditor

Re: [Wikitech-l] +2 request for yurik in mediawiki and maps-dev

2017-01-24 Thread Subramanya Sastry
On 01/25/2017 10:04 AM, Brian Wolff wrote: On Tuesday, January 24, 2017, Legoktm wrote: Hi, After speaking with Yurik, I've filed on his behalf to restore his membership in the mediawiki and maps-dev groups. I would

[Wikitech-l] Dev Summit 2017 session: DOM-based semantics for wikitext

2016-11-22 Thread Subramanya Sastry
Hello everyone, I have proposed a 2017 Dev Summit session titled "Improved editability, tooling, reasoning, and performance by adopting DOM-based semantics for wikitext" [1]. The TL:DR; summary is to treat a wikitext article as made up of document fragments that are composed together

[Wikitech-l] Parsoid: node 0.1x deprecated now; node 0.1x support will end March 31st, 2017

2016-11-15 Thread Subramanya Sastry
right away from the master branch of Parsoid. Going forward, the Parsoid codebase will adopt ES6 features available in node v4.x and higher which aren't supported in node 0.1x and will constitute a breaking change. Subramanya Sastry (Subbu), Technical Lead and Manager, Parsing Team, Wikimedia

[Wikitech-l] [MediaWiki-announce] Parsoid Security Update and Release

2016-11-02 Thread Subramanya Sastry
and as such, this exploit wasn't available for an exploit to steal user sessions. Thanks to the reporter of this exploit, Darian Patrick from the Security Team, Arlo Breault from the Parsing Team, Daniel Zahn and others from Ops for their assistance handling this bug and preparing this release. Subramanya Sastry

[Wikitech-l] Parsoid Security Update and Release

2016-11-01 Thread Subramanya Sastry
://gerrit.wikimedia.org/r/#/c/319115 [2] https://www.mediawiki.org/wiki/Parsoid/Deployments#Monday.2C_October_31.2C_2016_around_1:15_PT:_Y.C2.A0Deployed_e503e801 [3] https://releases.wikimedia.org/debian/pool/main/p/parsoid/ [4] https://www.npmjs.com/package/parsoid Subramanya Sastry, Technical Lead

Re: [Wikitech-l] Deploying the Linter extension to Wikimedia wikis

2016-10-24 Thread Subramanya Sastry
On 10/24/2016 08:42 AM, MZMcBride wrote: Does the extension distinguish between errors and warnings? Are there gradations of errors? For example, deprecated syntax v. invalid syntax? Technically, there are no errors with wikitext ... but yes, Parsoid knows what some of these "errors" are and

Re: [Wikitech-l] Loosing the history of our projects to bitrot. Was: Acquiring list of templates including external links

2016-08-06 Thread Subramanya Sastry
On 08/03/2016 10:48 PM, Subramanya Sastry wrote: On 08/03/2016 07:17 PM, Rob Lanphier wrote: On Mon, Aug 1, 2016 at 10:15 PM, Subramanya Sastry <ssas...@wikimedia.org> wrote: ... I think it is feasible to get there. But, whether we want a spec for wikitext and should work t

Re: [Wikitech-l] Loosing the history of our projects to bitrot. Was: Acquiring list of templates including external links

2016-08-03 Thread Subramanya Sastry
On 08/03/2016 07:17 PM, Rob Lanphier wrote: On Mon, Aug 1, 2016 at 10:15 PM, Subramanya Sastry <ssas...@wikimedia.org> wrote: When [a detailed list of stuff is] done, it become far more feasible to think of defining a spec for wikitext parsing that is not tied to the internals of med

Re: [Wikitech-l] Loosing the history of our projects to bitrot. Was: Acquiring list of templates including external links

2016-08-01 Thread Subramanya Sastry
TL:DR; You get to a spec by paying down technical debt that untangles wikitext parsing from being intricately tied to the internals of mediawiki implementation and state. In discussions, there is far too much focus on the fact that you cannot write a BNF grammar or yacc / lex / bison /

Re: [Wikitech-l] Loosing the history of our projects to bitrot. Was: Acquiring list of templates including external links

2016-08-01 Thread Subramanya Sastry
On 08/01/2016 11:37 AM, Marc-Andre wrote: ... Is there something we can do to make the passage of years hurt less? Should we be laying groundwork now to prevent issues decades away? One possibility is considering storing rendered HTML for old revisions. It lets wikitext (and hence parser)

Re: [Wikitech-l] some statistics about auto-inserted

2016-05-18 Thread Subramanya Sastry
Thanks Amir .. https://github.com/wikimedia/parsoid/blob/master/tools/fetch_ve_nowiki_edits.js is a quick hackjob of a script that I pulled together back in Oct 2015 which I used for a while to monitor counts (and the actual incidents) of nowikis ... This script could use a refresh and

Re: [Wikitech-l] Kunal (User:Legoktm) moving to Parsing Team

2016-04-08 Thread Subramanya Sastry
On 04/08/2016 05:14 AM, Ricordisamoa wrote: You mean subtract 1, or add -1 *facepalm* :-) Il 05/04/2016 18:53, Subramanya Sastry ha scritto: I suppose you've figured out that I don't know how to write citations. Subtract -1 from N for all [N] in the body. :-) -S. On 04/05/2016 11:41 AM

Re: [Wikitech-l] Kunal (User:Legoktm) moving to Parsing Team

2016-04-05 Thread Subramanya Sastry
I suppose you've figured out that I don't know how to write citations. Subtract -1 from N for all [N] in the body. :-) -S. On 04/05/2016 11:41 AM, Subramanya Sastry wrote: Hi everyone, We would like to let you know that Kunal (User:Legoktm for those who don’t already know) is moving inside

[Wikitech-l] Kunal (User:Legoktm) moving to Parsing Team

2016-04-05 Thread Subramanya Sastry
Hi everyone, We would like to let you know that Kunal (User:Legoktm for those who don’t already know) is moving inside the Editing Department from the Collaboration Team to the Parsing Team. The Collaboration team is grateful for Kunal’s great work over the past year, especially on the

Re: [Wikitech-l] tags are a usability nightmare for editing on mediawiki.org

2016-04-05 Thread Subramanya Sastry
2. Identifying document fragments for translation is another instance of the same problem of associating metadata with document fragments *across edits*. Citations, content-translation, comments-as-documentation, authorship information, maintaining-association-between-translated-fragments, etc.

Re: [Wikitech-l] tags are a usability nightmare for editing on mediawiki.org

2016-04-04 Thread Subramanya Sastry
Niklas and the language team: thanks for your efforts in enabling translation features. They are truly important and necessary. As for the topic of hacks, I feel wikitext's history has been one where people have stepped in to address critical issues / needs that existed at the time with

Re: [Wikitech-l] Minor REST API cleanup: Remove experimental listings, make timeuuid parameter mandatory for data-parsoid

2016-03-07 Thread Subramanya Sastry
On 03/07/2016 07:49 PM, Gabriel Wicke wrote: tl;dr: You are *very* likely not affected. As a Parsoid-side clarification, data-parsoid is considered private information. This information is primarily used by Parsoid to minimize dirty diffs when edited HTML is converted to wikitext. So,

Re: [Wikitech-l] External images in VE/Parsoid

2016-02-23 Thread Subramanya Sastry
On 02/23/2016 03:23 PM, James Montalvo wrote: So img tags are not whitelisted, and thus are they are treated as a literal string. By default the PHP parser does the same thing, but there's $wgAllowImageTag to allow img tags. ... I totally understand that bare image tags are not ideal in many

Re: [Wikitech-l] External images in VE/Parsoid

2016-02-23 Thread Subramanya Sastry
On 02/23/2016 02:46 PM, James Montalvo wrote: Why does it treat the img tag as a literal string, but not an h2 tag, for example? This is what the PHP parser does as well You can try it in a sandbox [1], for example. That is understandable because bare image tags can link to all kinds of

Re: [Wikitech-l] External images in VE/Parsoid

2016-02-23 Thread Subramanya Sastry
ng valid/trusted HTML (via the isHTML flag), it could wrap the HTML in a DOM fragment and tunnel it through the sanitizer .. for example, as with the tag. Subbu. --James On Tue, Feb 23, 2016 at 1:24 PM, Subramanya Sastry <ssas...@wikimedia.org> wrote: On 02/23/2016 12:45 PM, James

Re: [Wikitech-l] External images in VE/Parsoid

2016-02-23 Thread Subramanya Sastry
On 02/23/2016 12:45 PM, James Montalvo wrote: I'm trying to make images from an external source, provided by a parser function, work with VisualEditor and Parsoid. For a very simplified illustration I added the following to the bottom of LocalSettings.php ```

Re: [Wikitech-l] Parsoid entrypoint http://parsoid-lb.eqiad.wikimedia.org being decommissioned

2016-01-30 Thread Subramanya Sastry
On 01/30/2016 09:50 AM, Bartosz Dziewoński wrote: So what is the replacement for http://parsoid-lb.eqiad.wikimedia.org/_wikitext/ if I just want to see how Parsoid renders a piece of wikitext? It seems the fancy forms at https://en.wikipedia.org/api/rest_v1/?doc don't actually allow me to do

[Wikitech-l] Parsoid entrypoint http://parsoid-lb.eqiad.wikimedia.org being decommissioned

2016-01-29 Thread Subramanya Sastry
Hello everyone, The public Parsoid endpoint http://parsoid-lb.eqiad.wikimedia.org is being decommissioned [1] once we migrate over all straggler references to that endpoint [1] possibly as soon as 3 weeks from now. As far as we know, there are very few requests to that endpoint right now,

Re: [Wikitech-l] Parsoid still doesn't love me

2015-11-09 Thread Subramanya Sastry
On 11/09/2015 12:37 PM, Petr Bena wrote: Do you really want to say that reading from disk is faster than processing the text using CPU? I don't know how complex syntax of mw actually is, but C++ compilers are probably much faster than parsoid, if that's true. And these are very slow. What takes

Re: [Wikitech-l] Parsoid convert arbitrary HTML?

2015-11-06 Thread Subramanya Sastry
On 11/06/2015 10:18 AM, James Montalvo wrote: Can Parsoid be used to convert arbitrary HTML to wikitext? It's not clear to me whether it will only work with Parsoid's HTML+RDFa. I'm wondering if I could take snippets of HTML from non-MediaWiki webpages and convert them into wikitext. The right

Re: [Wikitech-l] Parsoid convert arbitrary HTML?

2015-11-06 Thread Subramanya Sastry
gwi...@wikimedia.org> wrote: To add to what Eric & Subbu have said, here is a link to the API documentation for this end point: https://en.wikipedia.org/api/rest_v1/?doc#!/Transforms/post_transform_html_to_wikitext_title_revision On Fri, Nov 6, 2015 at 8:47 AM, Subramanya Sastry <ssas...@wikimed

Re: [Wikitech-l] Parsoid still doesn't love me

2015-11-06 Thread Subramanya Sastry
Parsoid is simply a wikitext -> html and a html -> wikitext conversion service. Everything else would be tools and libs built on top of it. Subbu. On 11/06/2015 02:29 PM, Ricordisamoa wrote: What if I need to get all revisions (~2000) of a page in Parsoid HTML5? The prop=revisions API (in

Re: [Wikitech-l] Red links

2015-11-03 Thread Subramanya Sastry
See T39902 which I see that James has already added to the phab ticket here. The reason why we don't have redlinks as part of Parsoid output is that it defeats caching / storage. Everytime a page is created, all pages that link to it have to be flushed from storage. Subbu. On 11/03/2015

Re: [Wikitech-l] Reserving data-mw- attribute prefix in the sanitizer as non-user specifiable

2015-11-02 Thread Subramanya Sastry
On 11/02/2015 05:11 AM, Brian Wolff wrote: We already reserve data-ooui (by reserve, I mean blacklist in the sanitizer). But it feels wrong to use that for parts of mw that are not ooui. I would like to propose that we reserve data-mw- prefix as well for general usage by mediawiki/extensions

Re: [Wikitech-l] Inhibitors for Mobile Content Service to use Parsoid output

2015-10-19 Thread Subramanya Sastry
On 10/16/2015 01:14 PM, Bernd Sitzmann wrote: In any case, given that Parsoid's HTML clients usually talk through RESTBase rather than with Parsoid directly, this optional API flag would also have to be supported in RESTBase, and could potentially follow the redirects on behalf of the clients.

Re: [Wikitech-l] Inhibitors for Mobile Content Service to use Parsoid output

2015-10-16 Thread Subramanya Sastry
On 10/15/2015 08:52 PM, Bernd Sitzmann wrote: As part of moving the Mobile Content Service to use Parsoid instead of action=mobileview[1] I've ran into several missing features which make it significantly harder for the Mobile Content Service to use Parsoid, while providing the same

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-09-17 Thread Subramanya Sastry
On 09/17/2015 07:44 PM, Ricordisamoa wrote: Stephen Niedzielski: "it seems like, as soon as you get the HTML the first thing you want to do, perhaps a little bit ironically because it's called Parsoid, it's parse the output a little bit more" https://www.youtube.com/watch?v=3WJID_WC7BQ=35m14s

[Wikitech-l] Visual diffing updates

2015-09-14 Thread Subramanya Sastry
https://github.com/subbuss/parsoid_visual_diffs I've pushed a bunch of updates over the last week which should now make this usable for comparing HTML files from different sources (not restricted to PHP parser and Parsoid). I did this so that this could be used to compare the rendering of

Re: [Wikitech-l] RFC: Replace Tidy with HTML 5 parse/reserialize

2015-08-19 Thread Subramanya Sastry
On 08/19/2015 08:22 AM, MZMcBride wrote: And, as several others have noted, you can't just disable Tidy, since the effects of unclosed tags are not confined to the content area, and there is a large amount of existing content that depends on it. I have seen the effects of Tidy being accidentally

Re: [Wikitech-l] RFC: Replace Tidy with HTML 5 parse/reserialize

2015-08-18 Thread Subramanya Sastry
On 08/18/2015 07:58 AM, MZMcBride wrote: Subramanya Sastry wrote: * Unclosed HTML tags (very common) * Misnested tags * Misnesting of tags (ex: links in links .. [http://foo.bar this is a [[foobar]] company]) * Fostered content in tables (tablethis-content-will-show-up-outside-the-tabletrtd

Re: [Wikitech-l] RFC: Replace Tidy with HTML 5 parse/reserialize

2015-08-17 Thread Subramanya Sastry
On 08/17/2015 10:15 PM, MZMcBride wrote: Failing fast and loud is good in lots of contexts. I dont think wiki editing is one of them. The only cited example of real breakage so far has been mismatched divs. How often are you or anyone else adding divs to pages? In my experience, most users rely

Re: [Wikitech-l] [Engineering] Content WG: Templating, Page Components editing

2015-08-12 Thread Subramanya Sastry
On 08/12/2015 09:33 AM, Brad Jorsch (Anomie) wrote: On Tue, Aug 11, 2015 at 7:12 PM, Gabriel Wicke gwi...@wikimedia.org mailto:gwi...@wikimedia.org wrote: TL;DR: Join us to discuss Templates, Page Components editing on Thu, 13 August, 12:45 – 14:00 PDT [0]. I can't, so I'll just

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-31 Thread Subramanya Sastry
On 07/31/2015 12:55 PM, Ricordisamoa wrote: Hi Subbu, thank you for this thoughtful insight. And thank you for starting this thread. :-) HTML is not a barrier by itself. The problem seems to be Parsoid being built primarily with VisualEditor in mind. While we want the DOM to be

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread Subramanya Sastry
On 07/23/2015 01:07 PM, Ricordisamoa wrote: Il 23/07/2015 15:28, Antoine Musso ha scritto: Le 23/07/2015 08:15, Ricordisamoa a écrit : Are there any stable APIs for an application to get a parse tree in machine-readable format, manipulate it and send the result back without touching HTML? I'm

Re: [Wikitech-l] Parsoid announcement: Main roundtrip quality target achieved

2015-06-29 Thread Subramanya Sastry
On 06/29/2015 09:20 AM, Brad Jorsch (Anomie) wrote: On Fri, Jun 26, 2015 at 11:52 AM, Subramanya Sastry ssas...@wikimedia.org wrote: The PHP parser used in production has 3 components: the preprocessor, the core parser, Tidy. Parsoid relies on the PHP preprocessor (access via the mediawiki API

Re: [Wikitech-l] [Engineering] Parsoid announcement: Main roundtrip quality target achieved

2015-06-29 Thread Subramanya Sastry
On 06/29/2015 09:19 AM, Brad Jorsch (Anomie) wrote: On Thu, Jun 25, 2015 at 6:22 PM, Subramanya Sastry ssas...@wikimedia.org mailto:ssas...@wikimedia.org wrote: * Pare down rendering differences between the two systems so that we can start thinking about using Parsoid HTML instead

  1   2   >