On Mon, 08 Dec 2014 21:50:56 +0100, Simon Pieters <sim...@opera.com> wrote:
On Thu, 27 Nov 2014 01:15:20 +0100, Ian Hickson <i...@hixie.ch> wrote:
On Wed, 26 Nov 2014, Simon Pieters wrote:
- Make the end tag optional and have <menuitem>, <menu> and <hr>
generate implied </menuitem> end tags. (Maybe other tags like <li> and
<p> can also imply </menuitem>.) The label attribute be honored if
specified, otherwise use the textContent with leading and trailing
whitespace trimmed.
This would allow either syntax unless I'm missing something.
That's another option, yeah. Probably the best so far if we can't just
power through and break the sites in question. It's not yet clear to me
how many sites we're talking about here and how possible it is to
evaneglise them.
In httparchive
http://bigqueri.es/t/analyzing-html-css-and-javascript-response-bodies/442
:
FTR, the numbers were slightly wrong. I didn't count top-level pages, I
counted resources (including e.g. iframes). Also there is a bug in the
data with duplicate entries for some pages
(https://twitter.com/zcorpan/status/542363458671747072 ).
* 10101 pages use <menuitem>
8929 pages use <menuitem>
SELECT page, COUNT(*) as num
FROM [httparchive:runs.2014_08_15_requests_body]
WHERE mimeType CONTAINS "html"
AND REGEXP_MATCH(LOWER(body), r'<menuitem\s')
GROUP BY page
ORDER BY num desc
* 39 have no label attribute
* 0 have non-whitespace content
* 15 have no end tag
Based on this, it seems possible to keep it as a void element and only
use the label attribute.
SELECT COUNT(*) as num,
CASE
WHEN REGEXP_MATCH(LOWER(body), r'<menuitem\s([^>]+\s)?label\s*=')
THEN "label present"
ELSE "no label"
END as stat
FROM [httparchive:runs.2014_08_15_requests_body]
WHERE mimeType CONTAINS "html"
AND REGEXP_MATCH(LOWER(body), r'<menuitem')
GROUP BY stat
ORDER BY num desc
Row num stat
1 10062 label present
2 39 no label
8900 have label present (so 29 no label).
SELECT page, COUNT(*) as num
FROM [httparchive:runs.2014_08_15_requests_body]
WHERE mimeType CONTAINS "html"
AND REGEXP_MATCH(LOWER(body), r'<menuitem\s([^>]+\s)?label\s*=')
GROUP BY page
ORDER BY num desc
SELECT COUNT(*) as num,
CASE
WHEN REGEXP_MATCH(LOWER(body),
r'<menuitem[^>]*>(\s*[^<]+)+\s*</menuitem>') THEN "has content"
ELSE "no content"
END as stat
FROM [httparchive:runs.2014_08_15_requests_body]
WHERE mimeType CONTAINS "html"
AND REGEXP_MATCH(LOWER(body), r'<menuitem')
GROUP BY stat
ORDER BY num desc
Row num stat
1 10101 no content
SELECT COUNT(*) as num,
CASE
WHEN REGEXP_MATCH(LOWER(body), r'</menuitem>') THEN "end tag"
ELSE "no end tag"
END as stat
FROM [httparchive:runs.2014_08_15_requests_body]
WHERE mimeType CONTAINS "html"
AND REGEXP_MATCH(LOWER(body), r'<menuitem')
GROUP BY stat
ORDER BY num desc
Row num stat
1 10086 end tag
2 15 no end tag
--
Simon Pieters
Opera Software