Re: [O] [patch][ox-html] Stylistic changes
On 2014-03-18 15:46, Rasmus wrote: Rick Frankel r...@rickster.com writes: On 2014-03-17 23:36, Rasmus wrote: When you refer above to utf-8 entities, do you mean the named html entities (e.g., lt;) or the actual utf-8 encoded characters? The latter. Do M-x describe-char on such an character. Emacs will tell you the code points. My conjecture is therefore that one could write a script that would translate html values to these weird hex string or codepoints. It would create more ugly source output, but perhaps better for XHTML. Personally, I don't care about XHTML as I have little intuition as to when to use. . . Do you close the empty tags in your html (e.g., br /, hr /)? Then you're using xhtml. I believe the named entities are encoding independent, while including encoded characters in html output is fine -- although making sure the page is served with the correct character encoding is another issue entirely. Not what I meant. I'm only addressing your concern about HUMAN-READABLE-NAME; vs %HEX-VALUE;. As to using a more extensive set of named entities, as i said above, the problem is that the xhtml flavors don't support them, and I don't see any advantage in making the exporter handle character encoding differently based on ouput doctype. Definitely not. Why I ask if there's a point in changing nice entities to ugly entities for the sake of not getting them in XHTML-encoded documents. Yes we should. You can't properly post-process the html if it's invalid xml. And the definition of pretty and ugly are subjective. The question is, do we want to generate valid (x)html or not? My vote is yes. In our case, html is an output format and not a source format. In fact, we should probably compress out unnecessary whitespace, etc. the way other web generators do to make the smallest/most efficent output for webserving. As Nicolas would point out, you can always use a filter to map all the entities in the output. With ox-latex.el we for instance don't include entities that are not supported by the default package alist. A similar concern could be at play here. Agreed.
Re: [O] [patch][ox-html] Stylistic changes
On 2014-03-17 23:36, Rasmus wrote: Rick Frankel r...@rickster.com writes: On Mon, Mar 17, 2014 at 11:19:27PM +0100, Rasmus wrote: Hi Rick, Rick Frankel r...@rickster.com writes: On Mon, Mar 17, 2014 at 03:17:10AM +0100, Bastien wrote: Hello, Nicolas Goaziou n.goaz...@gmail.com writes: So if the change is only stylistic, I see no reason to break compatibility with ox-freemind.el. I would favor a solution where the HTML backend uses what's readable (mdash; and friends) and where the Freemind backend deals with this. Maybe `org-html-special-string-regexps' could be a variable and Freemind could temporarily set it up to what it needs? The use of numeric vs. named entities is not just stylistic. XHMTL[45] only support the 5 basic named entities interally: - amp; - the ampersand - quot; - the double quote - apos; single quote ' - lt; - less-than - gt; - greater-than So including any others will generate non-conforming output. Since the change is cosmetic, I don't see the purpose in adding a lot of conditional code to the exporter to output different entities for html[45] vs xhtml[45]. AFAIK, we have a lot more entities in org-entities with PRETTY-NAME;. When I've entities I've used a pretty name over a numeric value when I found one. What's you'r opinion on that? Should we go for readable or aim towards replacing them with these numeric values? We should use only those named entities (above) which are valid in xhtml(5). So, yes, we should change to using numeric entites for any other than the above. Since Emacs knows both the codepoints and the hex for utf8 entities it may be fairly simple to change the HTML representations, though I don't like it. . . When you refer above to utf-8 entities, do you mean the named html entities (e.g., lt;) or the actual utf-8 encoded characters? I believe the named entities are encoding independent, while including encoded characters in html output is fine -- although making sure the page is served with the correct character encoding is another issue entirely. As to using a more extensive set of named entities, as i said above, the problem is that the xhtml flavors don't support them, and I don't see any advantage in making the exporter handle character encoding differently based on ouput doctype. As Nicolas would point out, you can always use a filter to map all the entities in the output. rick
Re: [O] [patch][ox-html] Stylistic changes
Rick Frankel r...@rickster.com writes: On 2014-03-17 23:36, Rasmus wrote: When you refer above to utf-8 entities, do you mean the named html entities (e.g., lt;) or the actual utf-8 encoded characters? The latter. Do M-x describe-char on such an character. Emacs will tell you the code points. My conjecture is therefore that one could write a script that would translate html values to these weird hex string or codepoints. It would create more ugly source output, but perhaps better for XHTML. Personally, I don't care about XHTML as I have little intuition as to when to use. . . I believe the named entities are encoding independent, while including encoded characters in html output is fine -- although making sure the page is served with the correct character encoding is another issue entirely. Not what I meant. I'm only addressing your concern about HUMAN-READABLE-NAME; vs %HEX-VALUE;. As to using a more extensive set of named entities, as i said above, the problem is that the xhtml flavors don't support them, and I don't see any advantage in making the exporter handle character encoding differently based on ouput doctype. Definitely not. Why I ask if there's a point in changing nice entities to ugly entities for the sake of not getting them in XHTML-encoded documents. As Nicolas would point out, you can always use a filter to map all the entities in the output. With ox-latex.el we for instance don't include entities that are not supported by the default package alist. A similar concern could be at play here. –Rasmus -- El Rey ha muerto. ¡Larga vida al Rey!
Re: [O] [patch][ox-html] Stylistic changes
On Mon, Mar 17, 2014 at 03:17:10AM +0100, Bastien wrote: Hello, Nicolas Goaziou n.goaz...@gmail.com writes: So if the change is only stylistic, I see no reason to break compatibility with ox-freemind.el. I would favor a solution where the HTML backend uses what's readable (mdash; and friends) and where the Freemind backend deals with this. Maybe `org-html-special-string-regexps' could be a variable and Freemind could temporarily set it up to what it needs? The use of numeric vs. named entities is not just stylistic. XHMTL[45] only support the 5 basic named entities interally: - amp; - the ampersand - quot; - the double quote - apos; single quote ' - lt; - less-than - gt; - greater-than So including any others will generate non-conforming output. Since the change is cosmetic, I don't see the purpose in adding a lot of conditional code to the exporter to output different entities for html[45] vs xhtml[45]. rick
Re: [O] [patch][ox-html] Stylistic changes
Hi Rick, Rick Frankel r...@rickster.com writes: On Mon, Mar 17, 2014 at 03:17:10AM +0100, Bastien wrote: Hello, Nicolas Goaziou n.goaz...@gmail.com writes: So if the change is only stylistic, I see no reason to break compatibility with ox-freemind.el. I would favor a solution where the HTML backend uses what's readable (mdash; and friends) and where the Freemind backend deals with this. Maybe `org-html-special-string-regexps' could be a variable and Freemind could temporarily set it up to what it needs? The use of numeric vs. named entities is not just stylistic. XHMTL[45] only support the 5 basic named entities interally: - amp; - the ampersand - quot; - the double quote - apos; single quote ' - lt; - less-than - gt; - greater-than So including any others will generate non-conforming output. Since the change is cosmetic, I don't see the purpose in adding a lot of conditional code to the exporter to output different entities for html[45] vs xhtml[45]. AFAIK, we have a lot more entities in org-entities with PRETTY-NAME;. When I've entities I've used a pretty name over a numeric value when I found one. What's you'r opinion on that? Should we go for readable or aim towards replacing them with these numeric values? —Rasmus -- With monopolies the cake is a lie!
Re: [O] [patch][ox-html] Stylistic changes
Hello, Rasmus ras...@gmx.us writes: Here's a couple of minor changes for ox-html. Thank you. First, I dropped the references to HTML5 hgroup since apparently W3 did the same¹. Applied. Second, for some reason ox-html replaces a couple of entities by itself—rather than letting org-entities do it—but uses hex references (or whatever), rather than a nice HTML character entity. The second patch fixes this. (I naively assume that there is not a reason for not using the pretty references). According to the log of the commit introducing the changes, there is a reason: commit f2b2c8318fa8c2ce82208d717c649377c856802c Author: Jambunathan K kjambunat...@gmail.com Date: Sat Mar 2 11:00:46 2013 +0530 Add Freemind Mindmap Back-End for Org Export Engine * contrib/lisp/ox-freemind.el: New file. * lisp/ox-html.el (org-html--tags, org-html-format-headline) (org-html--format-toc-headline, org-html-checkbox) (org-html-table-cell, org-html-timestamp) (org-html-verse-block, org-html-special-string-regexps): Replace named HTML entities with their numeric counterparts. This keeps Freemind backend happy. So if the change is only stylistic, I see no reason to break compatibility with ox-freemind.el. Regards, -- Nicolas Goaziou
Re: [O] [patch][ox-html] Stylistic changes
Nicolas Goaziou n.goaz...@gmail.com writes: Second, for some reason ox-html replaces a couple of entities by itself—rather than letting org-entities do it—but uses hex references (or whatever), rather than a nice HTML character entity. The second patch fixes this. (I naively assume that there is not a reason for not using the pretty references). According to the log of the commit introducing the changes, there is a reason: commit f2b2c8318fa8c2ce82208d717c649377c856802c Author: Jambunathan K kjambunat...@gmail.com Date: Sat Mar 2 11:00:46 2013 +0530 Add Freemind Mindmap Back-End for Org Export Engine * contrib/lisp/ox-freemind.el: New file. * lisp/ox-html.el (org-html--tags, org-html-format-headline) (org-html--format-toc-headline, org-html-checkbox) (org-html-table-cell, org-html-timestamp) (org-html-verse-block, org-html-special-string-regexps): Replace named HTML entities with their numeric counterparts. This keeps Freemind backend happy. So if the change is only stylistic, I see no reason to break compatibility with ox-freemind.el. Obviously not. I should have checked with git-blame first, but I honestly didn't remember that this tool existed. Thanks! —Rasmus -- May the Force be with you
Re: [O] [patch][ox-html] Stylistic changes
Hello, Nicolas Goaziou n.goaz...@gmail.com writes: So if the change is only stylistic, I see no reason to break compatibility with ox-freemind.el. I would favor a solution where the HTML backend uses what's readable (mdash; and friends) and where the Freemind backend deals with this. Maybe `org-html-special-string-regexps' could be a variable and Freemind could temporarily set it up to what it needs? -- Bastien
[O] [patch][ox-html] Stylistic changes
Hi, Here's a couple of minor changes for ox-html. First, I dropped the references to HTML5 hgroup since apparently W3 did the same¹. Second, for some reason ox-html replaces a couple of entities by itself—rather than letting org-entities do it—but uses hex references (or whatever), rather than a nice HTML character entity. The second patch fixes this. (I naively assume that there is not a reason for not using the pretty references). —Rasmus Footnotes: ¹ e.g. http://html5doctor.com/the-hgroup-element/ -- May the Force be with you From 8325901e959e16d34546ca7bf74d7efbc8e16825 Mon Sep 17 00:00:00 2001 From: Rasmus w...@pank.eu Date: Sun, 16 Mar 2014 00:36:21 +0100 Subject: [PATCH 1/2] Remove reference to hgroup in ox-html * ox-html.el (org-html-html5-elements): Drop reference to hgroup. --- lisp/ox-html.el | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/lisp/ox-html.el b/lisp/ox-html.el index a8c924f..cb95161 100644 --- a/lisp/ox-html.el +++ b/lisp/ox-html.el @@ -169,10 +169,8 @@ progress section video) New elements in html5. -hgroup is not included because it's currently impossible to -wrap special blocks around multiple headlines. For other blocks -that should contain headlines, use the HTML_CONTAINER property on -the headline itself.) +For blocks that should contain headlines, use the HTML_CONTAINER +property on the headline itself.) (defconst org-html-special-string-regexps '((- . #x00ad;) ; shy -- 1.9.0 From bd096d2040d4ffaa517466ac85c4e0da08863bec Mon Sep 17 00:00:00 2001 From: Rasmus w...@pank.eu Date: Sun, 16 Mar 2014 00:54:11 +0100 Subject: [PATCH 2/2] Proper HTML entities for dashes, dots in ox-html * ox-html.el (org-html-special-string-regexps): Use HTML entities. --- lisp/ox-html.el | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lisp/ox-html.el b/lisp/ox-html.el index cb95161..8e22df6 100644 --- a/lisp/ox-html.el +++ b/lisp/ox-html.el @@ -173,10 +173,10 @@ For blocks that should contain headlines, use the HTML_CONTAINER property on the headline itself.) (defconst org-html-special-string-regexps - '((- . #x00ad;) ; shy -(---\\([^-]\\) . #x2014;\\1) ; mdash -(--\\([^-]\\) . #x2013;\\1) ; ndash -(\\.\\.\\. . #x2026;)) ; hellip + '((- . shy;) ; shy +(---\\([^-]\\) . mdash;\\1) ; mdash +(--\\([^-]\\) . ndash;\\1) ; ndash +(\\.\\.\\. . hellip;)) ; hellip Regular expressions for special string conversion.) (defconst org-html-scripts -- 1.9.0