Re: [O] [patch][ox-html] Stylistic changes

2014-03-19 Thread Rick Frankel

On 2014-03-18 15:46, Rasmus wrote:

Rick Frankel r...@rickster.com writes:

On 2014-03-17 23:36, Rasmus wrote:
When you refer above to utf-8 entities, do you mean the named html
entities (e.g., lt;) or the actual utf-8 encoded characters?

The latter.  Do M-x describe-char on such an character.  Emacs will
tell you the code points.  My conjecture is therefore that one could
write a script that would translate html values to these weird hex
string or codepoints.  It would create more ugly source output, but
perhaps better for XHTML.  Personally, I don't care about XHTML as I
have little intuition as to when to use. . .


Do you close the empty tags in your html (e.g., br /, hr /)? Then
you're using xhtml.


I believe the named entities are encoding independent, while including
encoded characters in html output is fine -- although making sure the
page is served with the correct character encoding is another issue
entirely.

Not what I meant.  I'm only addressing your concern about
HUMAN-READABLE-NAME; vs %HEX-VALUE;.

As to using a more extensive set of named entities, as i said above,
the problem is that the xhtml flavors don't support them, and I don't
see any advantage in making the exporter handle character encoding
differently based on ouput doctype.

Definitely not.  Why I ask if there's a point in changing nice
entities to ugly entities for the sake of not getting them in
XHTML-encoded documents.


Yes we should. You can't properly post-process the html if it's
invalid xml. And the definition of pretty and ugly are subjective.

The question is, do we want to generate valid (x)html or not? My vote
is yes. In our case, html is an output format and not a source format.
In fact, we should probably compress out unnecessary whitespace, etc.
the way other web generators do to make the smallest/most efficent
output for webserving.


As Nicolas would point out, you can always use a filter to map all the
entities in the output.

With ox-latex.el we for instance don't include entities that are not
supported by the default package alist.  A similar concern could be at
play here.


Agreed.



Re: [O] [patch][ox-html] Stylistic changes

2014-03-18 Thread Rick Frankel

On 2014-03-17 23:36, Rasmus wrote:

Rick Frankel r...@rickster.com writes:

On Mon, Mar 17, 2014 at 11:19:27PM +0100, Rasmus wrote:
Hi Rick,


Rick Frankel r...@rickster.com writes:

 On Mon, Mar 17, 2014 at 03:17:10AM +0100, Bastien wrote:
 Hello,

 Nicolas Goaziou n.goaz...@gmail.com writes:

  So if the change is only stylistic, I see no reason to break
  compatibility with ox-freemind.el.

 I would favor a solution where the HTML backend uses what's
 readable (mdash; and friends) and where the Freemind backend
 deals with this.

 Maybe `org-html-special-string-regexps' could be a variable
 and Freemind could temporarily set it up to what it needs?

 The use of numeric vs. named entities is not just stylistic.
 XHMTL[45] only support the 5 basic named entities interally:

   - amp; - the ampersand 
   - quot; - the double quote 
   - apos; single quote '
   - lt; - less-than 
   - gt; - greater-than 

 So including any others will generate non-conforming output.
 Since the change is cosmetic, I don't see the purpose in adding a lot
 of conditional code to the exporter to output different entities for
 html[45] vs xhtml[45].

AFAIK, we have a lot more entities in org-entities with PRETTY-NAME;.
When I've entities I've used a pretty name over a numeric value when I
found one.  What's you'r opinion on that?  Should we go for readable
or aim towards replacing them with these numeric values?

We should use only those named entities (above) which are valid in
xhtml(5). So, yes, we should change to using numeric entites for any
other than the above.

Since Emacs knows both the codepoints and the hex for utf8 entities it
may be fairly simple to change the HTML representations, though I
don't like it. . .


When you refer above to utf-8 entities, do you mean the named html
entities (e.g., lt;) or the actual utf-8 encoded characters?

I believe the named entities are encoding independent, while including
encoded characters in html output is fine -- although making sure the
page is served with the correct character encoding is another issue
entirely.

As to using a more extensive set of named entities, as i said above,
the problem is that the xhtml flavors don't support them, and I don't
see any advantage in making the exporter handle character encoding
differently based on ouput doctype.

As Nicolas would point out, you can always use a filter to map all the
entities in the output.

rick



Re: [O] [patch][ox-html] Stylistic changes

2014-03-18 Thread Rasmus
Rick Frankel r...@rickster.com writes:

 On 2014-03-17 23:36, Rasmus wrote:
 When you refer above to utf-8 entities, do you mean the named html
 entities (e.g., lt;) or the actual utf-8 encoded characters?

The latter.  Do M-x describe-char on such an character.  Emacs will
tell you the code points.  My conjecture is therefore that one could
write a script that would translate html values to these weird hex
string or codepoints.  It would create more ugly source output, but
perhaps better for XHTML.  Personally, I don't care about XHTML as I
have little intuition as to when to use. . .

 I believe the named entities are encoding independent, while including
 encoded characters in html output is fine -- although making sure the
 page is served with the correct character encoding is another issue
 entirely.

Not what I meant.  I'm only addressing your concern about
HUMAN-READABLE-NAME; vs %HEX-VALUE;.

 As to using a more extensive set of named entities, as i said above,
 the problem is that the xhtml flavors don't support them, and I don't
 see any advantage in making the exporter handle character encoding
 differently based on ouput doctype.

Definitely not.  Why I ask if there's a point in changing nice
entities to ugly entities for the sake of not getting them in
XHTML-encoded documents.

 As Nicolas would point out, you can always use a filter to map all the
 entities in the output.

With ox-latex.el we for instance don't include entities that are not
supported by the default package alist.  A similar concern could be at
play here.

–Rasmus

-- 
El Rey ha muerto. ¡Larga vida al Rey!




Re: [O] [patch][ox-html] Stylistic changes

2014-03-17 Thread Rick Frankel
On Mon, Mar 17, 2014 at 03:17:10AM +0100, Bastien wrote:
 Hello,

 Nicolas Goaziou n.goaz...@gmail.com writes:

  So if the change is only stylistic, I see no reason to break
  compatibility with ox-freemind.el.

 I would favor a solution where the HTML backend uses what's
 readable (mdash; and friends) and where the Freemind backend
 deals with this.

 Maybe `org-html-special-string-regexps' could be a variable
 and Freemind could temporarily set it up to what it needs?

The use of numeric vs. named entities is not just stylistic.
XHMTL[45] only support the 5 basic named entities interally:

  - amp; - the ampersand 
  - quot; - the double quote 
  - apos; single quote '
  - lt; - less-than 
  - gt; - greater-than 

So including any others will generate non-conforming output.
Since the change is cosmetic, I don't see the purpose in adding a lot
of conditional code to the exporter to output different entities for
html[45] vs xhtml[45].

rick



Re: [O] [patch][ox-html] Stylistic changes

2014-03-17 Thread Rasmus
Hi Rick,

Rick Frankel r...@rickster.com writes:

 On Mon, Mar 17, 2014 at 03:17:10AM +0100, Bastien wrote:
 Hello,

 Nicolas Goaziou n.goaz...@gmail.com writes:

  So if the change is only stylistic, I see no reason to break
  compatibility with ox-freemind.el.

 I would favor a solution where the HTML backend uses what's
 readable (mdash; and friends) and where the Freemind backend
 deals with this.

 Maybe `org-html-special-string-regexps' could be a variable
 and Freemind could temporarily set it up to what it needs?

 The use of numeric vs. named entities is not just stylistic.
 XHMTL[45] only support the 5 basic named entities interally:

   - amp; - the ampersand 
   - quot; - the double quote 
   - apos; single quote '
   - lt; - less-than 
   - gt; - greater-than 

 So including any others will generate non-conforming output.
 Since the change is cosmetic, I don't see the purpose in adding a lot
 of conditional code to the exporter to output different entities for
 html[45] vs xhtml[45].

AFAIK, we have a lot more entities in org-entities with PRETTY-NAME;.
When I've entities I've used a pretty name over a numeric value when I
found one.  What's you'r opinion on that?  Should we go for readable
or aim towards replacing them with these numeric values?

—Rasmus

-- 
With monopolies the cake is a lie!



Re: [O] [patch][ox-html] Stylistic changes

2014-03-16 Thread Nicolas Goaziou
Hello,

Rasmus ras...@gmx.us writes:

 Here's a couple of minor changes for ox-html.

Thank you.

 First, I dropped the references to HTML5 hgroup since apparently W3
 did the same¹.

Applied.

 Second, for some reason ox-html replaces a couple of entities by
 itself—rather than letting org-entities do it—but uses hex references
 (or whatever), rather than a nice HTML character entity.  The second
 patch fixes this.  (I naively assume that there is not a reason for
 not using the pretty references).

According to the log of the commit introducing the changes, there is
a reason:


  commit f2b2c8318fa8c2ce82208d717c649377c856802c
  Author: Jambunathan K kjambunat...@gmail.com
  Date:   Sat Mar 2 11:00:46 2013 +0530

  Add Freemind Mindmap Back-End for Org Export Engine
  
  * contrib/lisp/ox-freemind.el: New file.
  
  * lisp/ox-html.el (org-html--tags, org-html-format-headline)
  (org-html--format-toc-headline, org-html-checkbox)
  (org-html-table-cell, org-html-timestamp)
  (org-html-verse-block, org-html-special-string-regexps):
  Replace named HTML entities with their numeric counterparts.
  This keeps Freemind backend happy.

So if the change is only stylistic, I see no reason to break
compatibility with ox-freemind.el.


Regards,

-- 
Nicolas Goaziou



Re: [O] [patch][ox-html] Stylistic changes

2014-03-16 Thread Rasmus
Nicolas Goaziou n.goaz...@gmail.com writes:

 Second, for some reason ox-html replaces a couple of entities by
 itself—rather than letting org-entities do it—but uses hex references
 (or whatever), rather than a nice HTML character entity.  The second
 patch fixes this.  (I naively assume that there is not a reason for
 not using the pretty references).

 According to the log of the commit introducing the changes, there is
 a reason:


   commit f2b2c8318fa8c2ce82208d717c649377c856802c
   Author: Jambunathan K kjambunat...@gmail.com
   Date:   Sat Mar 2 11:00:46 2013 +0530

   Add Freemind Mindmap Back-End for Org Export Engine
   
   * contrib/lisp/ox-freemind.el: New file.
   
   * lisp/ox-html.el (org-html--tags, org-html-format-headline)
   (org-html--format-toc-headline, org-html-checkbox)
   (org-html-table-cell, org-html-timestamp)
   (org-html-verse-block, org-html-special-string-regexps):
   Replace named HTML entities with their numeric counterparts.
   This keeps Freemind backend happy.

 So if the change is only stylistic, I see no reason to break
 compatibility with ox-freemind.el.

Obviously not.  I should have checked with git-blame first, but I
honestly didn't remember that this tool existed.  Thanks!

—Rasmus

-- 
May the Force be with you



Re: [O] [patch][ox-html] Stylistic changes

2014-03-16 Thread Bastien
Hello,

Nicolas Goaziou n.goaz...@gmail.com writes:

 So if the change is only stylistic, I see no reason to break
 compatibility with ox-freemind.el.

I would favor a solution where the HTML backend uses what's
readable (mdash; and friends) and where the Freemind backend
deals with this.

Maybe `org-html-special-string-regexps' could be a variable
and Freemind could temporarily set it up to what it needs?

-- 
 Bastien



[O] [patch][ox-html] Stylistic changes

2014-03-15 Thread Rasmus
Hi,

Here's a couple of minor changes for ox-html.

First, I dropped the references to HTML5 hgroup since apparently W3
did the same¹.

Second, for some reason ox-html replaces a couple of entities by
itself—rather than letting org-entities do it—but uses hex references
(or whatever), rather than a nice HTML character entity.  The second
patch fixes this.  (I naively assume that there is not a reason for
not using the pretty references).

—Rasmus

Footnotes: 
¹   e.g. http://html5doctor.com/the-hgroup-element/

-- 
May the Force be with you
From 8325901e959e16d34546ca7bf74d7efbc8e16825 Mon Sep 17 00:00:00 2001
From: Rasmus w...@pank.eu
Date: Sun, 16 Mar 2014 00:36:21 +0100
Subject: [PATCH 1/2] Remove reference to hgroup in ox-html

* ox-html.el (org-html-html5-elements): Drop reference to hgroup.
---
 lisp/ox-html.el | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/lisp/ox-html.el b/lisp/ox-html.el
index a8c924f..cb95161 100644
--- a/lisp/ox-html.el
+++ b/lisp/ox-html.el
@@ -169,10 +169,8 @@
 progress section video)
   New elements in html5.
 
-hgroup is not included because it's currently impossible to
-wrap special blocks around multiple headlines. For other blocks
-that should contain headlines, use the HTML_CONTAINER property on
-the headline itself.)
+For blocks that should contain headlines, use the HTML_CONTAINER
+property on the headline itself.)
 
 (defconst org-html-special-string-regexps
   '((- . #x00ad;)		; shy
-- 
1.9.0

From bd096d2040d4ffaa517466ac85c4e0da08863bec Mon Sep 17 00:00:00 2001
From: Rasmus w...@pank.eu
Date: Sun, 16 Mar 2014 00:54:11 +0100
Subject: [PATCH 2/2] Proper HTML entities for dashes, dots in ox-html

* ox-html.el (org-html-special-string-regexps): Use HTML entities.
---
 lisp/ox-html.el | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lisp/ox-html.el b/lisp/ox-html.el
index cb95161..8e22df6 100644
--- a/lisp/ox-html.el
+++ b/lisp/ox-html.el
@@ -173,10 +173,10 @@ For blocks that should contain headlines, use the HTML_CONTAINER
 property on the headline itself.)
 
 (defconst org-html-special-string-regexps
-  '((- . #x00ad;)		; shy
-(---\\([^-]\\) . #x2014;\\1)	; mdash
-(--\\([^-]\\) . #x2013;\\1)	; ndash
-(\\.\\.\\. . #x2026;))		; hellip
+  '((- . shy;)		; shy
+(---\\([^-]\\) . mdash;\\1)	; mdash
+(--\\([^-]\\) . ndash;\\1)	; ndash
+(\\.\\.\\. . hellip;))		; hellip
   Regular expressions for special string conversion.)
 
 (defconst org-html-scripts
-- 
1.9.0