Re: [Wikitech-l] Html.php line 269

2015-02-18 Thread Petr Bena
I was digging and I found that commit you sent - eefe1b13a but you can
see that this commit modified 2 functions: both open and closeElement,
however for some reason the code was removed from closeElement so now
calling openElement('html') would produce empty string, while
closeElement('html') would produce /html and that should IMHO create
HTML code that would only contain closing tags in case that html /
head didn't have any parameters. Actually head, at least on wmf
installation doesn't have any parameters, but it has opening tag,
which is weird. (Perhaps wgWellFormedXml is true by default?)

On Wed, Feb 18, 2015 at 10:31 PM, Brion Vibber bvib...@wikimedia.org wrote:
 On Wed, Feb 18, 2015 at 1:23 PM, Petr Bena benap...@gmail.com wrote:

 Can someone explain the point of these lines to me?

 https://github.com/wikimedia/mediawiki/blob/master/includes/Html.php#L269


 Someone thought it would be clever to reduce page size by a few bytes here
 and there. :)

 In practice we always have attributes on the html so this would only take
 effect on head for this particular case. However there are other bits
 which can be more aggressively dropped, such as closing tags for table
 parts etc.

 Here's the commit that added this and related bits:
 https://github.com/wikimedia/mediawiki/commit/eefe1b13a382a7d11c2137bbf900b783ee445323


 If these tags are optional, they can be there, so why remove them? If
 you remove them, you should probably also care about them in
 closeElement() so that mediawiki doesn't produce html which has only
 ending tags for html and head.


 I believe that's also covered, yes. (Keep in mind also that the way the
 HTML content model works is that those elements always exist in the DOM;
 the tags can simply be omitted from the markup because they are implied by
 the surrounding content.)



 It seems that on WMF installation we don't use anyway as there is head
 tag. So why is it there? What purpose does it server?


 I think we still have the well-formed XML mode enabled for
 backwards-compatibility?

 -- brion
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html.php line 269

2015-02-18 Thread Brion Vibber
On Wed, Feb 18, 2015 at 1:23 PM, Petr Bena benap...@gmail.com wrote:

 Can someone explain the point of these lines to me?

 https://github.com/wikimedia/mediawiki/blob/master/includes/Html.php#L269


Someone thought it would be clever to reduce page size by a few bytes here
and there. :)

In practice we always have attributes on the html so this would only take
effect on head for this particular case. However there are other bits
which can be more aggressively dropped, such as closing tags for table
parts etc.

Here's the commit that added this and related bits:
https://github.com/wikimedia/mediawiki/commit/eefe1b13a382a7d11c2137bbf900b783ee445323


 If these tags are optional, they can be there, so why remove them? If
 you remove them, you should probably also care about them in
 closeElement() so that mediawiki doesn't produce html which has only
 ending tags for html and head.


I believe that's also covered, yes. (Keep in mind also that the way the
HTML content model works is that those elements always exist in the DOM;
the tags can simply be omitted from the markup because they are implied by
the surrounding content.)



 It seems that on WMF installation we don't use anyway as there is head
 tag. So why is it there? What purpose does it server?


I think we still have the well-formed XML mode enabled for
backwards-compatibility?

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html.php line 269

2015-02-18 Thread Petr Bena
I think it's the other way, omiting the html tags is probably correct,
as long as both opening and ending tags are omitted :P so I think the
closeElement function needs a fix

On Wed, Feb 18, 2015 at 10:42 PM, Ricordisamoa
ricordisa...@openmailbox.org wrote:
 Uploaded https://gerrit.wikimedia.org/r/191473/ for consistency with
 closeElement.
 Thanks for reporting.

 Il 18/02/2015 22:38, Petr Bena ha scritto:

 I was digging and I found that commit you sent - eefe1b13a but you can
 see that this commit modified 2 functions: both open and closeElement,
 however for some reason the code was removed from closeElement so now
 calling openElement('html') would produce empty string, while
 closeElement('html') would produce /html and that should IMHO create
 HTML code that would only contain closing tags in case that html /
 head didn't have any parameters. Actually head, at least on wmf
 installation doesn't have any parameters, but it has opening tag,
 which is weird. (Perhaps wgWellFormedXml is true by default?)

 On Wed, Feb 18, 2015 at 10:31 PM, Brion Vibber bvib...@wikimedia.org
 wrote:

 On Wed, Feb 18, 2015 at 1:23 PM, Petr Bena benap...@gmail.com wrote:

 Can someone explain the point of these lines to me?


 https://github.com/wikimedia/mediawiki/blob/master/includes/Html.php#L269

 Someone thought it would be clever to reduce page size by a few bytes
 here
 and there. :)

 In practice we always have attributes on the html so this would only
 take
 effect on head for this particular case. However there are other bits
 which can be more aggressively dropped, such as closing tags for table
 parts etc.

 Here's the commit that added this and related bits:

 https://github.com/wikimedia/mediawiki/commit/eefe1b13a382a7d11c2137bbf900b783ee445323


 If these tags are optional, they can be there, so why remove them? If
 you remove them, you should probably also care about them in
 closeElement() so that mediawiki doesn't produce html which has only
 ending tags for html and head.

 I believe that's also covered, yes. (Keep in mind also that the way the
 HTML content model works is that those elements always exist in the DOM;
 the tags can simply be omitted from the markup because they are implied
 by
 the surrounding content.)


 It seems that on WMF installation we don't use anyway as there is head
 tag. So why is it there? What purpose does it server?

 I think we still have the well-formed XML mode enabled for
 backwards-compatibility?

 -- brion
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html.php line 269

2015-02-18 Thread Petr Bena
That patch you made even breaks tests:
https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs-HEAD/13002/console
:)

so either remove a unit test for that feature, or just fix the
closeElement but I don't even know if it's really broken, maybe there
is some more magic on behing that would fix the broken html before it
gets printed out

On Wed, Feb 18, 2015 at 10:47 PM, Petr Bena benap...@gmail.com wrote:
 I think it's the other way, omiting the html tags is probably correct,
 as long as both opening and ending tags are omitted :P so I think the
 closeElement function needs a fix

 On Wed, Feb 18, 2015 at 10:42 PM, Ricordisamoa
 ricordisa...@openmailbox.org wrote:
 Uploaded https://gerrit.wikimedia.org/r/191473/ for consistency with
 closeElement.
 Thanks for reporting.

 Il 18/02/2015 22:38, Petr Bena ha scritto:

 I was digging and I found that commit you sent - eefe1b13a but you can
 see that this commit modified 2 functions: both open and closeElement,
 however for some reason the code was removed from closeElement so now
 calling openElement('html') would produce empty string, while
 closeElement('html') would produce /html and that should IMHO create
 HTML code that would only contain closing tags in case that html /
 head didn't have any parameters. Actually head, at least on wmf
 installation doesn't have any parameters, but it has opening tag,
 which is weird. (Perhaps wgWellFormedXml is true by default?)

 On Wed, Feb 18, 2015 at 10:31 PM, Brion Vibber bvib...@wikimedia.org
 wrote:

 On Wed, Feb 18, 2015 at 1:23 PM, Petr Bena benap...@gmail.com wrote:

 Can someone explain the point of these lines to me?


 https://github.com/wikimedia/mediawiki/blob/master/includes/Html.php#L269

 Someone thought it would be clever to reduce page size by a few bytes
 here
 and there. :)

 In practice we always have attributes on the html so this would only
 take
 effect on head for this particular case. However there are other bits
 which can be more aggressively dropped, such as closing tags for table
 parts etc.

 Here's the commit that added this and related bits:

 https://github.com/wikimedia/mediawiki/commit/eefe1b13a382a7d11c2137bbf900b783ee445323


 If these tags are optional, they can be there, so why remove them? If
 you remove them, you should probably also care about them in
 closeElement() so that mediawiki doesn't produce html which has only
 ending tags for html and head.

 I believe that's also covered, yes. (Keep in mind also that the way the
 HTML content model works is that those elements always exist in the DOM;
 the tags can simply be omitted from the markup because they are implied
 by
 the surrounding content.)


 It seems that on WMF installation we don't use anyway as there is head
 tag. So why is it there? What purpose does it server?

 I think we still have the well-formed XML mode enabled for
 backwards-compatibility?

 -- brion
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html.php line 269

2015-02-18 Thread Petr Bena
Most of browsers would probably handle it OK, but validators not.

On Wed, Feb 18, 2015 at 10:45 PM, Petr Bena benap...@gmail.com wrote:
 Yep

 That makes me feel like this is bug which everyone overlooked because
 it's true by default. If you set it to false mediawiki would likely
 write out syntactically wrong code.

 On Wed, Feb 18, 2015 at 10:43 PM, Gergo Tisza gti...@wikimedia.org wrote:
 On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benap...@gmail.com wrote:

 (Perhaps wgWellFormedXml is true by default?)


 It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html.php line 269

2015-02-18 Thread Ricordisamoa
Uploaded https://gerrit.wikimedia.org/r/191473/ for consistency with 
closeElement.

Thanks for reporting.

Il 18/02/2015 22:38, Petr Bena ha scritto:

I was digging and I found that commit you sent - eefe1b13a but you can
see that this commit modified 2 functions: both open and closeElement,
however for some reason the code was removed from closeElement so now
calling openElement('html') would produce empty string, while
closeElement('html') would produce /html and that should IMHO create
HTML code that would only contain closing tags in case that html /
head didn't have any parameters. Actually head, at least on wmf
installation doesn't have any parameters, but it has opening tag,
which is weird. (Perhaps wgWellFormedXml is true by default?)

On Wed, Feb 18, 2015 at 10:31 PM, Brion Vibber bvib...@wikimedia.org wrote:

On Wed, Feb 18, 2015 at 1:23 PM, Petr Bena benap...@gmail.com wrote:


Can someone explain the point of these lines to me?

https://github.com/wikimedia/mediawiki/blob/master/includes/Html.php#L269


Someone thought it would be clever to reduce page size by a few bytes here
and there. :)

In practice we always have attributes on the html so this would only take
effect on head for this particular case. However there are other bits
which can be more aggressively dropped, such as closing tags for table
parts etc.

Here's the commit that added this and related bits:
https://github.com/wikimedia/mediawiki/commit/eefe1b13a382a7d11c2137bbf900b783ee445323



If these tags are optional, they can be there, so why remove them? If
you remove them, you should probably also care about them in
closeElement() so that mediawiki doesn't produce html which has only
ending tags for html and head.


I believe that's also covered, yes. (Keep in mind also that the way the
HTML content model works is that those elements always exist in the DOM;
the tags can simply be omitted from the markup because they are implied by
the surrounding content.)



It seems that on WMF installation we don't use anyway as there is head
tag. So why is it there? What purpose does it server?


I think we still have the well-formed XML mode enabled for
backwards-compatibility?

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html.php line 269

2015-02-18 Thread Gergo Tisza
On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benap...@gmail.com wrote:

 (Perhaps wgWellFormedXml is true by default?)


It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html.php line 269

2015-02-18 Thread Petr Bena
Yep

That makes me feel like this is bug which everyone overlooked because
it's true by default. If you set it to false mediawiki would likely
write out syntactically wrong code.

On Wed, Feb 18, 2015 at 10:43 PM, Gergo Tisza gti...@wikimedia.org wrote:
 On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benap...@gmail.com wrote:

 (Perhaps wgWellFormedXml is true by default?)


 It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html.php line 269

2015-02-18 Thread Tim Starling
On 19/02/15 08:43, Gergo Tisza wrote:
 On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benap...@gmail.com wrote:
 
 (Perhaps wgWellFormedXml is true by default?)
 
 
 It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml

There was a Bugzilla report and Gerrit change requesting that it be
set to false:

https://phabricator.wikimedia.org/T52040
https://gerrit.wikimedia.org/r/#/c/70036/

I was against it, partly because of the omitted head tag.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html.php line 269

2015-02-18 Thread Gabriel Wicke
I would also recommend against actively trying to emit barely parsing
output. Any savings after compression should be rather small, and if only
end tags are omitted the DOM will of course still be the same size after
parsing.

In Parsoid we went to some modest lengths
https://github.com/wikimedia/parsoid/blob/master/lib/XMLSerializer.js to
produce polyglot markup http://www.w3.org/TR/html-polyglot/, which is
both valid XML and HTML5. This has enabled consumers to use either XML or
HTML5 parsers, which has proven very useful in practice. For example, this
makes it easier to consume this content using PHP's libxml. Doing the same
in MediaWiki core is admittedly harder, but I still think that we should
follow the robustness principle
https://en.wikipedia.org/wiki/Robustness_principle wherever we can.

Gabriel

On Wed, Feb 18, 2015 at 5:59 PM, Tim Starling tstarl...@wikimedia.org
wrote:

 On 19/02/15 08:43, Gergo Tisza wrote:
  On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benap...@gmail.com wrote:
 
  (Perhaps wgWellFormedXml is true by default?)
 
 
  It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml

 There was a Bugzilla report and Gerrit change requesting that it be
 set to false:

 https://phabricator.wikimedia.org/T52040
 https://gerrit.wikimedia.org/r/#/c/70036/

 I was against it, partly because of the omitted head tag.

 -- Tim Starling


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l