Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

2016-05-14 Thread Brian Wolff
On Saturday, May 14, 2016, Strainu wrote: > 2016-05-14 4:07 GMT+03:00 Legoktm : >> Hi, >> >> On 05/02/2016 11:42 AM, Brian Wolff wrote: >>> See gerrit patch https://gerrit.wikimedia.org/r/286495 I would >>> appreciate everyone's feedback. >> >> Given the lack of objections here and on Gerrit, I we

Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

2016-05-14 Thread Antoine Musso
Le 14/05/2016 à 03:07, Legoktm a écrit : > Hi, > > On 05/02/2016 11:42 AM, Brian Wolff wrote: >> See gerrit patch https://gerrit.wikimedia.org/r/286495 I would >> appreciate everyone's feedback. > > Given the lack of objections here and on Gerrit, I went ahead and merged > it today. Hello, That

Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

2016-05-14 Thread Strainu
2016-05-14 4:07 GMT+03:00 Legoktm : > Hi, > > On 05/02/2016 11:42 AM, Brian Wolff wrote: >> See gerrit patch https://gerrit.wikimedia.org/r/286495 I would >> appreciate everyone's feedback. > > Given the lack of objections here and on Gerrit, I went ahead and merged > it today. Can you please clar

Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

2016-05-13 Thread Legoktm
Hi, On 05/02/2016 11:42 AM, Brian Wolff wrote: > See gerrit patch https://gerrit.wikimedia.org/r/286495 I would > appreciate everyone's feedback. Given the lack of objections here and on Gerrit, I went ahead and merged it today. -- Legoktm ___ Wikitec

Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

2016-05-03 Thread Brian Wolff
On Monday, May 2, 2016, Max Semenik wrote: > On Mon, May 2, 2016 at 3:04 PM, Brian Wolff wrote: > >> > At this point, I would say that everybody who screen-scrapes saw it coming > and breaking them is a good thing as sometimes, lessons just have to be > learned. > Personally, I dont think we sh

Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

2016-05-03 Thread Gergo Tisza
On Tue, May 3, 2016 at 2:43 AM, Max Semenik wrote: > At this point, I would say that everybody who screen-scrapes saw it coming > and breaking them is a good thing as sometimes, lessons just have to be > learned. > There aren't many options other than content-scraping if you want to transform Wi

Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

2016-05-03 Thread Gergo Tisza
On Tue, May 3, 2016 at 4:34 PM, Gergo Tisza wrote: > > There aren't many options other than content-scraping if you want to > transform Wikipedia articles into some semblance of structured data. We > even do it ourselves, for media metadata (and use an XML parser for it > Actually the XML parser

Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

2016-05-02 Thread Max Semenik
On Mon, May 2, 2016 at 3:04 PM, Brian Wolff wrote: > > There are references to it breaking people's screen scraping bots last time > it was turned on. That was like 5 years ago though. > At this point, I would say that everybody who screen-scrapes saw it coming and breaking them is a good thing

Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

2016-05-02 Thread Brian Wolff
> > The only benefit of $wgWellFormedXml was that you could toss your > "well-formed" tag soup into an XML parser that didn't grok HTML. I have no > idea if that worked reliably or was actually useful to anyone, but it's > probably worth confirming that before actually removing the funky > self-clo

Re: [Wikitech-l] Getting rid of $wgWellFormedXml = false;

2016-05-02 Thread Brion Vibber
I'd say an HTML5 output mode *ought* to work like this: *Don't try to be clever.* * Consistency and predictability are key to both security review and data consumability. *Quote attributes consistently and predictably.* * Always use double-quotes on attributes in output. *Output specced empty ta

[Wikitech-l] Getting rid of $wgWellFormedXml = false;

2016-05-02 Thread Brian Wolff
So currently, we have two ways of outputting html - $wgWellFormedXml = true (The default), outputs html that happens to conform with the rules of XML. $wgWellFormedXml = false on the other hand, uses more lax html5 rules to save a few bytes. Having two modes of output, feels rather silly to me. Or