Re: [O] Smart quotes not working with some languages?

2013-11-14 Thread Daniil Frumin
Thanks!


I prepared a patch as you've suggested (attached).


On Thu, Nov 14, 2013 at 3:15 AM, Rasmus ras...@gmx.us wrote:

 Hi Daniil,

 Daniil Frumin difru...@gmail.com writes:

  Hi! I am using the latest org from git and I can't get org to export (I
  need LaTeX export particularly) files with smart quotes.

 You need to add it to the variable org-export-smart-quotes-alist
 defined in ox.el.  Put your cursor on the variable and do C-h v to
 read about it.  Once you have added support for Russian, please make a
 patch and submit it here so that other people can benefit from your
 labor.

 Thanks,
 Rasmus

 --
 Dung makes an excellent fertilizer





-- 
Sincerely yours,
-- Daniil


0001-Adding-support-for-russian-to-smart-quotes-exporter.patch
Description: Binary data


Re: [O] Smart quotes not working with some languages?

2013-11-14 Thread Rasmus
Caio Daniil,

Daniil Frumin difru...@gmail.com writes:

 I prepared a patch as you've suggested (attached).

Are the unicode entities correct?  In my UTF8-Emacs they look like,
e.g, \302\253.  It could be OK, though and I could just be missing a
font that has these glyphs.

Also, please change the commit message as described here

  http://orgmode.org/worg/org-contribute.html#sec-5

(Hint: git rebase -i origin/master (probably you already know more git
then me, but in the off chance that you don't I've included a suitable
command)).

If you haven't signed FSF papers you should end the patch with
TINYCHANGE.

When resubmitting append your message with [PATCH].

Thanks again,
Rasmus

-- 
⠠⠵



Re: [O] Smart quotes not working with some languages?

2013-11-13 Thread Rasmus
Hi Daniil,

Daniil Frumin difru...@gmail.com writes:

 Hi! I am using the latest org from git and I can't get org to export (I
 need LaTeX export particularly) files with smart quotes.

You need to add it to the variable org-export-smart-quotes-alist
defined in ox.el.  Put your cursor on the variable and do C-h v to
read about it.  Once you have added support for Russian, please make a
patch and submit it here so that other people can benefit from your
labor.

Thanks,
Rasmus

-- 
Dung makes an excellent fertilizer




Re: [O] Smart Quotes Exporting

2012-08-07 Thread Bastien
Hi Mark and Nicolas,

in the patchwork¹, I've marked patches² related to this discussion as
Not Applicable.

If there are progress made on this front, please send updated patches.
If there is a patch below that I should apply, please let me know.

Thanks!

¹ http://patchwork.newartisans.com/project/org-mode/list/
² Here are the patches:

http://patchwork.newartisans.com/patch/1330/
http://patchwork.newartisans.com/patch/1344/
http://patchwork.newartisans.com/patch/1346/
http://patchwork.newartisans.com/patch/1348/

-- 
 Bastien



Re: [O] Smart Quotes Exporting

2012-06-19 Thread Nicolas Goaziou
Hello,

Mark Shoulson m...@kli.org writes:

 Well, wait; regexps can make some pretty darn good guesses at the beginnings 
 or ends of strings.

I know that. They make a good job, I just want a better one.

 This isn't quite it; beginning-of-string followed by quote, then punctuation 
 and then spaces is also a close-quote, etc... There is a lot of fine-tuning.  
 But even what I currently have was able to handle your 

 Caesar said, /Alea Jacta est./

 example.

No, it doesn't handle that, actually, it's just sheer luck.  Indeed, the
quoting function is applied to \.  There's absolutely no space,
punctuation, etc. to save the day.  So it makes a wild guess with
a probability of 0.5 of success.  Since the guess is always the same,
/a/ will always fail.

 The case where a quote both sits at the edge of a string (i.e. at the border 
 of some element, formatting, etc) *and* does not have whitespace next to it, 
 with possible punctuation, does not seem to be a normal occurrence to me.  If 
 I'm wrong, how common *is* it?

Even if it rarely happens, it can be _very_ annoying to have to cope
with bad guesses. If it can be avoided, I see no reason not to do so.

Now, here the infrastructure I propose.

Internally, the two following functions are required.

#+begin_src emacs-lisp
(defun org-export--smart-quotes-in-element (element backend)
  Replace plain quotes with smart quotes in ELEMENT.

ELEMENT is an Org element or a secondary string.  BACKEND is the
back-end to check for rules, as a symbol.

This is a destructive operation.  Return new element.
  (let* ((type (org-element-type element))
 (properties (and type (nth 1 element
;; Destructively apply changes to secondary string, if any.
(let ((secondary (and type (assq type org-element-secondary-value-alist
  (when secondary
(let* ((sec-symbol (cdr secondary))
   (sec-value (plist-get properties sec-symbol)))
  (when sec-value
(setq properties
  (plist-put properties
 sec-symbol
 (org-export--smart-quotes-in-element
  sec-value backend)))
;; Destructively change `:caption' if present.  Since it's a dual
;; keyword, apply smart quotes to both CAR and CDR, if required.
(let ((caption (plist-get :caption properties)))
  (when caption
(setq properties
  (plist-put properties
 :caption
 (cons
  (org-export--smart-quotes-in-element
   (car caption) backend)
  (and (cdr caption)
   (org-export--smart-quotes-in-element
(cdr caption) backend)))
;; Recursively apply changes to contents.  Rebuild ELEMENT along
;; the way, with updated strings.
(let ((contents (if type (org-element-contents element) element))
  previous current next acc)
  (while contents
(setq current (pop contents)
  next (car contents)
  previous current)
(push
 (cond ((stringp current)
;; CURRENT is a string: Call
;; `org-export-quotation-marks' with appropriate
;; information.
(org-export-quotation-marks
 current
 (and previous
  (if (stringp previous)
  (length (and (string-match  +\\' previous)
   (match-string 0 previous)))
(org-element-property :post-blank previous)))
 (and next
  (if (not (stringp next)) 0
(length (and (string-match \\` + next)
 (match-string 0 next)
 backend))
   ;; CURRENT is recursive: Move into it.
   ((plist-get properties :contents-begin)
(org-export--smart-quotes-in-element current backend))
   ;; Otherwise, just accumulate CURRENT.
   (t current))
 acc))
  ;; Re-build transformed element.
  (if (or (not type) (eq type 'plain-text)) (nreverse acc)
(nconc (list type properties) (nreverse acc))

(defun org-export-set-smart-quotes (tree backend info)
  Replace plain quotes with smart quotes in TREE.

BACKEND is the back-end, as a symbol, used for transcoding.  INFO
is a plist used as a communication channel.

This is a destructive operation.  This function is meant to be
used as a parse tree filter for back-ends activating smart
quotes.
  ;; Destructively apply smart quotes to parsed keywords in info.
  (let ((value (plist-get info :title)))
(when value
  (setq info
(plist-put info
   :title
   (org-export--smart-quotes-in-element value backend)
  ;; 

Re: [O] Smart Quotes Exporting

2012-06-15 Thread Mark Shoulson
Nicolas Goaziou n.goaziou at gmail.com writes:

 
 Hello,
 
 Mark Shoulson mark at kli.org writes:
 
  ASCII exporter also handle UTF-8. So it's good to have there too.
 
  Really?  I would have thought ASCII meant ASCII, as in 7-bit clean
  text.
 
 org-e-ascii.el (as old org-ascii.el) handles ASCII, Latin1 and UTF-8
 encodings.

I noticed that after writing my response.  The name just threw me a little.  
Yes, that exporter needs to handle it too.

  It looked to me like your solution would essentially boil down to do
  string handling when there's a string, otherwise recur down and find
  the strings, which essentially means apply it to all the
  strings... and there were already functions out there applying things
  to strings, so this can just ride along with them.  Here, let's look
  at your suggestion and see if we can find what I missed:
 

  So, if it's a string, use the regexps (if they can be smart enough to look 
at
  beginning and end of the string, which they can--though I haven't been 
using the
  :post-blank property so presumably something is amiss), and if it isn't a
  string, recur down until you get to a string... Ah, but only if it's in
  org-element-recursive-objects.
 
 You're missing an important part: the regexps cannot be smart enough for
 quotes at the beginning or the end of the string. There, you must look
 outside the string. Hence:

Well, wait; regexps can make some pretty darn good guesses at the beginnings 
or ends of strings.  Quotations don't normally end in spaces (in the 
conventions used with ; French typography is different, but if you're using 
spaces around your quotes you have worse problems (line-breaks) to worry 
about).  So if a string ends in space(s) followed by a quote, it's very likely 
that quote is an open-quote for some stuff that comes after.  Conversely, if a 
string starts with a quote followed by some spaces, it's very likely a close-
quote to what went on before.

This isn't quite it; beginning-of-string followed by quote, then punctuation 
and then spaces is also a close-quote, etc... There is a lot of fine-tuning.  
But even what I currently have was able to handle your 

Caesar said, /Alea Jacta est./

example.  Yes, there are edge-cases which this won't catch, and it remains to 
be seen how pervasive and annoying those are.  It may be that repeated 
tweaking of regexps will handle enough of the ordinary cases.  It may be that 
after a few rounds of regexp-hacking someone will finally decide that regexp-
hacking just won't handle enough of the important cases.  But I think even as 
it stands now we'd probably handle 80-90% of the normal situations, which 
really is as much as we reasonably can hope for.

Could I trouble someone to try applying my patch and trying it out for 
yourself and seeing just how bad/good the performance is?  It seems to work 
okay for the cases I've been trying, but maybe my dataset isn't robust 
enough.  Let's give it a test and seen how many actual cases in common usage 
it gets wrong.  Maybe see how much can be fixed by tuning regexps.

 
  ]  1. If it has a quote as its first or last position, check for
  ] objects before or after the string to guess its status. An
  ] object never starts with a white space, but you may have to
  ] check :post-blank property in order to know if previous object
  ] had white spaces at its end.
 
 But you can only do that from the element containing the string, not
 from the string itself.

The case where a quote both sits at the edge of a string (i.e. at the border 
of some element, formatting, etc) *and* does not have whitespace next to it, 
with possible punctuation, does not seem to be a normal occurrence to me.  If 
I'm wrong, how common *is* it?

 
  So the issue with the current state is that it
  would wind up applying to too much? (it would hit code and verbatim 
elements,
  for example, and that would be wrong.)
 
 No, you are not applying it too much (verbatim elements don't contain
 plain-text objects) but your function hasn't got access to enough
 information to be useful.

The on-screen version, of course, will have to be smarter and check for 
the face formatting to make sure it doesn't happen in comments or verbatims; 
I am pretty sure it does not do that yet.
 
  wait, called on the top-level parsed tree object, recursively doing
  its thing before(?) the transcoders of the individual objects get to
  it.
 
 That's called a parse tree filter. That should be a possibility
 indeed. The function would be applied on the parse tree and would
 replace strings within elements containing plain text (that is
 paragraph, verse-block and table-row types). parse tree filters are
 applied very early in the export process.
 
 Another option would be to integrate it into
 `org-element-normalize-contents', but I think the previous way is
 better.

Maybe.  I know it sounds like I'm fixated on the plain-text solution, but I'm 
not convinced the 

Re: [O] Smart Quotes Exporting

2012-06-12 Thread Nicolas Goaziou
Hello,

Mark Shoulson m...@kli.org writes:

 ASCII exporter also handle UTF-8. So it's good to have there too.

 Really?  I would have thought ASCII meant ASCII, as in 7-bit clean
 text.

org-e-ascii.el (as old org-ascii.el) handles ASCII, Latin1 and UTF-8
encodings.

 It looked to me like your solution would essentially boil down to do
 string handling when there's a string, otherwise recur down and find
 the strings, which essentially means apply it to all the
 strings... and there were already functions out there applying things
 to strings, so this can just ride along with them.  Here, let's look
 at your suggestion and see if we can find what I missed:

 ] Walk element/object/secondary-string's contents .
 ] 
 ]   1. When a string is encountered:
 ]
 ]  1. If it has a quote as its first or last position, check for
 ] objects before or after the string to guess its status. An
 ] object never starts with a white space, but you may have to
 ] check :post-blank property in order to know if previous object
 ] had white spaces at its end.
 ]
 ]  2. For each quote everywhere else in the string, your regexp can
 ] handle it fine.
 ]
 ]   2. When an object belonging to `org-element-recursive-objects' is
 ]  encountered, apply the function to this object.
 ]
 ]   3. Accumulate returned strings or objects.

 So, if it's a string, use the regexps (if they can be smart enough to look at
 beginning and end of the string, which they can--though I haven't been using 
 the
 :post-blank property so presumably something is amiss), and if it isn't a
 string, recur down until you get to a string... Ah, but only if it's in
 org-element-recursive-objects.

You're missing an important part: the regexps cannot be smart enough for
quotes at the beginning or the end of the string. There, you must look
outside the string. Hence:

 ]  1. If it has a quote as its first or last position, check for
 ] objects before or after the string to guess its status. An
 ] object never starts with a white space, but you may have to
 ] check :post-blank property in order to know if previous object
 ] had white spaces at its end.

But you can only do that from the element containing the string, not
from the string itself.

 So the issue with the current state is that it
 would wind up applying to too much? (it would hit code and verbatim elements,
 for example, and that would be wrong.)

No, you are not applying it too much (verbatim elements don't contain
plain-text objects) but your function hasn't got access to enough
information to be useful.

 So it remains to find the right place in the processing to put
 a function like the one you describe.  I'm trying to get a proper
 understanding of the code structure to see what you mean.  Looks like
 it should be something like a transcoder, only called on
 everything... 

Transcoders are type specific, so that's not an option.

 wait, called on the top-level parsed tree object, recursively doing
 its thing before(?) the transcoders of the individual objects get to
 it.

That's called a parse tree filter. That should be a possibility
indeed. The function would be applied on the parse tree and would
replace strings within elements containing plain text (that is
paragraph, verse-block and table-row types). parse tree filters are
applied very early in the export process.

Another option would be to integrate it into
`org-element-normalize-contents', but I think the previous way is
better.

 The on-screen one would still use the plain-string computation, as you said,
 since the full parse isn't available.

Yes.

 It would also need to be tweaked not to act on verbatim/comment text,
 etc.

Yes. You may want to use `org-element-at-point' and `org-element-type'
to tell if you're somewhere smart quotes are allowed (in table,
table-row, paragraph, verse-block elements).


Regards,

-- 
Nicolas Goaziou



Re: [O] Smart Quotes Exporting

2012-06-10 Thread Mark Shoulson
Nicolas Goaziou n.goaziou at gmail.com writes:

 
 Hello,
 
 Mark E. Shoulson mark at kli.org writes:
 
  Update on the smart-quotes patch.  Supports the odt exporter now too,
  which I think covers all the current major new exporters for which
  it is relevant (adding smart quotes to ASCII export is a contradiction
  in terms;
 
 ASCII exporter also handle UTF-8. So it's good to have there too.

Really?  I would have thought ASCII meant ASCII, as in 7-bit clean text.  More
of a plain text exporter then.  Fair enough.  I'll work it in.

  should it be in the publish exporter?  It didn't look like it to
  me).
 
 No.

OK, good.

 
  Added an options keyword, '' (that is, the double-quote mark) to
  select smart quotes on/off, and a defcustom for customizing your
  default.  Set the default default [sic] to nil, though actually it
  might be reasonable to set it to t.  Slight touch-up to the regexps
  since last time, but they will definitely be subject to a lot of
  fine-tuning as more special cases are found that break them and ways
  to fix it are found (the close-quote still breaks on one of /a/. or
  /a./)
 
 Again, using regexps on plain text objects is a wrong approach, as you
 need a better understanding of the whole paragraph structure to
 properly. I already suggested a possible solution, is there anything
 wrong with it?

It looked to me like your solution would essentially boil down to do string
handling when there's a string, otherwise recur down and find the strings,
which essentially means apply it to all the strings... and there were already
functions out there applying things to strings, so this can just ride along with
them.  Here, let's look at your suggestion and see if we can find what I missed:

] Walk element/object/secondary-string's contents .
] 
]   1. When a string is encountered:
]
]  1. If it has a quote as its first or last position, check for
] objects before or after the string to guess its status. An
] object never starts with a white space, but you may have to
] check :post-blank property in order to know if previous object
] had white spaces at its end.
]
]  2. For each quote everywhere else in the string, your regexp can
] handle it fine.
]
]   2. When an object belonging to `org-element-recursive-objects' is
]  encountered, apply the function to this object.
]
]   3. Accumulate returned strings or objects.

So, if it's a string, use the regexps (if they can be smart enough to look at
beginning and end of the string, which they can--though I haven't been using the
:post-blank property so presumably something is amiss), and if it isn't a
string, recur down until you get to a string... Ah, but only if it's in
org-element-recursive-objects.  So the issue with the current state is that it
would wind up applying to too much? (it would hit code and verbatim elements,
for example, and that would be wrong.)  And detecting such things at the string
level would be the wrong place... So it remains to find the right place in the
processing to put a function like the one you describe.  I'm trying to get a
proper understanding of the code structure to see what you mean.  Looks like it
should be something like a transcoder, only called on everything... wait, called
on the top-level parsed tree object, recursively doing its thing before(?) the
transcoders of the individual objects get to it.  So almost something replacing
the (lambda (blob contents info) contents) stub in org-export-transcoder; does
that make sense to you? Otherwise, called somehow in org-export-data.  In either
case made a hook of some kind so that it is backend-specific.

Does it sound like I am understanding this right, to you?

The on-screen one would still use the plain-string computation, as you said,
since the full parse isn't available.  And that seems to work okay (the export
works okay too, for simple cases.)  It would also need to be tweaked not to act
on verbatim/comment text, etc.

Thanks,

~mark




Re: [O] Smart Quotes Exporting

2012-06-07 Thread Nicolas Goaziou
Hello,

Mark E. Shoulson m...@kli.org writes:

 Update on the smart-quotes patch.  Supports the odt exporter now too,
 which I think covers all the current major new exporters for which
 it is relevant (adding smart quotes to ASCII export is a contradiction
 in terms;

ASCII exporter also handle UTF-8. So it's good to have there too.

 should it be in the publish exporter?  It didn't look like it to
 me).

No.

 Added an options keyword, '' (that is, the double-quote mark) to
 select smart quotes on/off, and a defcustom for customizing your
 default.  Set the default default [sic] to nil, though actually it
 might be reasonable to set it to t.  Slight touch-up to the regexps
 since last time, but they will definitely be subject to a lot of
 fine-tuning as more special cases are found that break them and ways
 to fix it are found (the close-quote still breaks on one of /a/. or
 /a./)

Again, using regexps on plain text objects is a wrong approach, as you
need a better understanding of the whole paragraph structure to
properly. I already suggested a possible solution, is there anything
wrong with it?


Regards,

-- 
Nicolas Goaziou



Re: [O] Smart Quotes Exporting

2012-06-05 Thread Mark E. Shoulson
Update on the smart-quotes patch.  Supports the odt exporter now too, 
which I think covers all the current major new exporters for which it 
is relevant (adding smart quotes to ASCII export is a contradiction in 
terms; should it be in the publish exporter?  It didn't look like it 
to me).


Added an options keyword, '' (that is, the double-quote mark) to select 
smart quotes on/off, and a defcustom for customizing your default.  Set 
the default default [sic] to nil, though actually it might be reasonable 
to set it to t.  Slight touch-up to the regexps since last time, but 
they will definitely be subject to a lot of fine-tuning as more special 
cases are found that break them and ways to fix it are found (the 
close-quote still breaks on one of /a/. or /a./)


It's pretty good on the whole, though, usually guesses right.  I know 
there's some work being done on the odt exporter; hope this fits in well 
with it.


How does it look to you?

~mark

From e6df2efd1a9ce36964a20fc06aa2a688acd87efb Mon Sep 17 00:00:00 2001
From: Mark Shoulson m...@kli.org
Date: Tue, 29 May 2012 23:01:12 -0400
Subject: [PATCH] Add `smart' quotes for onscreen display and for latex and
 html export

* lisp/org.el: Add `smart' quotes: custom variables to define
  regexps to recognize quotes, to define how and whether to
  display them, and org-fontify-quotes to display `smart-quote'
  characters when activated.

* contrib/lisp/org-export.el: Add function org-export-quotation-marks
  as a utility function usable by individual exporters to apply
  `smart' quotes.  Also add keyword '' for customizing smart quotes,
  and custom default for it.

* contrib/lisp/org-e-latex.el: Replace org-e-latex-quotes custom with
  org-e-latex-quotes-replacements and make org-e-latex--quotation-marks
  use the org-export-quotation-marks function in org-export.el.

* contrib/lisp/org-e-html.el: Replace org-e-html-quotes custom with
  org-e-html-quotes-replacements and enable org-e-html--quotation-marks,
  using org-export-quotation-marks function in org-export.el.

* contrib/lisp/org-e-odt.el: Replace org-e-odt-quotes custom with
  org-e-odt-quotes-replacements and make org-e-odt--quotation-marks
  use org-export-quotations-marks function in org-export.el.
---
 contrib/lisp/org-e-html.el  |   57 
 contrib/lisp/org-e-latex.el |   67 ++---
 contrib/lisp/org-e-odt.el   |   68 ++---
 contrib/lisp/org-export.el  |   38 
 lisp/org.el |  101 +++
 5 files changed, 203 insertions(+), 128 deletions(-)

diff --git a/contrib/lisp/org-e-html.el b/contrib/lisp/org-e-html.el
index 4287a59..c49608d 100644
--- a/contrib/lisp/org-e-html.el
+++ b/contrib/lisp/org-e-html.el
@@ -1043,37 +1043,24 @@ in order to mimic default behaviour:
 
  Plain text
 
-(defcustom org-e-html-quotes
-  '((fr
- (\\(\\s-\\|[[(]\\|^\\)\ . «~)
- (\\(\\S-\\)\ . ~»)
- (\\(\\s-\\|(\\|^\\)' . '))
-(en
- (\\(\\s-\\|[[(]\\|^\\)\ . ``)
- (\\(\\S-\\)\ . '')
- (\\(\\s-\\|(\\|^\\)' . `)))
-  Alist for quotes to use when converting english double-quotes.
-
-The CAR of each item in this alist is the language code.
-The CDR of each item in this alist is a list of three CONS:
-- the first CONS defines the opening quote;
-- the second CONS defines the closing quote;
-- the last CONS defines single quotes.
-
-For each item in a CONS, the first string is a regexp
-for allowed characters before/after the quote, the second
-string defines the replacement string for this quote.
+(defcustom org-e-html-smart-quote-replacements
+  '((fr laquo;nbsp; nbsp;raquo; lsquo; rsquo; rsquo;)
+(en ldquo; rdquo; lsquo; rsquo; rsquo;)
+(de bdquo; ldquo; sbquo; lsquo; rsquo;))
+  What to export for `smart-quotes'.
+A list of five strings:
+ 1. Open double-quotes
+ 2. Close double-quotes
+ 3. Open single-quote
+ 4. Close single-quote
+ 5. Mid-word apostrophe
   :group 'org-export-e-html
   :type '(list
-	  (cons :tag Opening quote
-		(string :tag Regexp for char before)
-		(string :tag Replacement quote ))
-	  (cons :tag Closing quote
-		(string :tag Regexp for char after )
-		(string :tag Replacement quote ))
-	  (cons :tag Single quote
-		(string :tag Regexp for char before)
-		(string :tag Replacement quote 
+	  (string :tag Open double-quotes); “
+	  (string :tag Close double-quotes)   ; ”
+	  (string :tag Open single-quote) ; ‘
+	  (string :tag Close single-quote); ’
+	  (string :tag Mid-word apostrophe))) ; ’
 
  Compilation
 
@@ -1459,15 +1446,7 @@ This is used to choose a separator for constructs like \\verb.
   Export quotation marks depending on language conventions.
 TEXT is a string containing quotation marks to be replaced.  INFO
 is a plist used as a communication channel.
-  (mapc (lambda(l)
-	  (let ((start 0))
-	(while (setq start (string-match (car l) text start))
-	  (let 

Re: [O] Smart Quotes Exporting

2012-06-02 Thread Mark E. Shoulson
All right, preliminary patch is attached, *maybe* good enough for more 
serious consideration now, but might need some fixes. Still only uses 
ordinary regexps and plain-text strings, but can now handle the example 
with formatting-breaks next to quotes. Things have been moved into more 
appropriate locations, made customs, docstrings and types fixed, etc, etc.


It supports onscreen display of smart quotes (when enabled); I have 
the quotes displayed in org-document-info face so they are slightly 
distinct, to make it clearer that they are altered from what they are 
in the plain text. This may or may not be a popular (or good) idea. I 
have also built it into the new export engine in org-e-latex and 
org-e-html as proofs of concept. I'm not positive the latex one will 
work properly for German, though; there might need to be something 
enabled in LaTeX for it to format ,, into „.


It should probably be set not to smartify quotes onscreen in comments; I 
haven't done that yet.


Comments welcome; I hope I didn't complicate matters in the export 
engines too much.


~mark
From 1bc507cf69c94d5645436abc6e28e7d96999083e Mon Sep 17 00:00:00 2001
From: Mark Shoulson m...@kli.org
Date: Tue, 29 May 2012 23:01:12 -0400
Subject: [PATCH] Add `smart' quotes for onscreen display and for latex and
 html export

* lisp/org.el: Add `smart' quotes: custom variables to define
  regexps to recognize quotes, to define how and whether to
  display them, and org-fontify-quotes to display `smart-quote'
  characters when activated.

* contrib/lisp/org-export.el: Add function org-export-quotation-marks
  as a utility function usable by individual exporters to apply
  `smart' quotes.

* contrib/lisp/org-e-latex.el: Replace org-e-latex-quotes custom with
  org-e-latex-quotes-replacements and make org-e-latex--quotation-marks
  use the org-export-quotation-marks function in org-export.el.

* contrib/lisp/org-e-html.el: Replace org-e-html-quotes custom with
  org-e-html-quotes-replacements and enable org-e-html--quotation-marks,
  using org-export-quotation-marks function in org-export.el.
---
 contrib/lisp/org-e-html.el  |   57 
 contrib/lisp/org-e-latex.el |   67 ++---
 contrib/lisp/org-export.el  |   26 +++
 lisp/org.el |  101 +++
 4 files changed, 168 insertions(+), 83 deletions(-)

diff --git a/contrib/lisp/org-e-html.el b/contrib/lisp/org-e-html.el
index 53547a0..d4a505e 100644
--- a/contrib/lisp/org-e-html.el
+++ b/contrib/lisp/org-e-html.el
@@ -1077,37 +1077,24 @@ in order to mimic default behaviour:
 
  Plain text
 
-(defcustom org-e-html-quotes
-  '((fr
- (\\(\\s-\\|[[(]\\|^\\)\ . «~)
- (\\(\\S-\\)\ . ~»)
- (\\(\\s-\\|(\\|^\\)' . '))
-(en
- (\\(\\s-\\|[[(]\\|^\\)\ . ``)
- (\\(\\S-\\)\ . '')
- (\\(\\s-\\|(\\|^\\)' . `)))
-  Alist for quotes to use when converting english double-quotes.
-
-The CAR of each item in this alist is the language code.
-The CDR of each item in this alist is a list of three CONS:
-- the first CONS defines the opening quote;
-- the second CONS defines the closing quote;
-- the last CONS defines single quotes.
-
-For each item in a CONS, the first string is a regexp
-for allowed characters before/after the quote, the second
-string defines the replacement string for this quote.
+(defcustom org-e-html-smart-quote-replacements
+  '((fr laquo;nbsp; nbsp;raquo; lsquo; rsquo; rsquo;)
+(en ldquo; rdquo; lsquo; rsquo; rsquo;)
+(de bdquo; ldquo; sbquo; lsquo; rsquo;))
+  What to export for `smart-quotes'.
+A list of five strings:
+ 1. Open double-quotes
+ 2. Close double-quotes
+ 3. Open single-quote
+ 4. Close single-quote
+ 5. Mid-word apostrophe
   :group 'org-export-e-html
   :type '(list
-	  (cons :tag Opening quote
-		(string :tag Regexp for char before)
-		(string :tag Replacement quote ))
-	  (cons :tag Closing quote
-		(string :tag Regexp for char after )
-		(string :tag Replacement quote ))
-	  (cons :tag Single quote
-		(string :tag Regexp for char before)
-		(string :tag Replacement quote 
+	  (string :tag Open double-quotes); “
+	  (string :tag Close double-quotes)   ; ”
+	  (string :tag Open single-quote) ; ‘
+	  (string :tag Close single-quote); ’
+	  (string :tag Mid-word apostrophe))) ; ’
 
  Compilation
 
@@ -1497,15 +1484,7 @@ This is used to choose a separator for constructs like \\verb.
   Export quotation marks depending on language conventions.
 TEXT is a string containing quotation marks to be replaced.  INFO
 is a plist used as a communication channel.
-  (mapc (lambda(l)
-	  (let ((start 0))
-	(while (setq start (string-match (car l) text start))
-	  (let ((new-quote (concat (match-string 1 text) (cdr l
-		(setq text (replace-match new-quote  t t text))
-	(cdr (or (assoc (plist-get info :language) org-e-html-quotes)
-		 ;; Falls back on English.
-		 (assoc en 

Re: [O] Smart Quotes Exporting

2012-06-01 Thread Nicolas Goaziou
Hello,

Mark E. Shoulson m...@kli.org writes:

 Oh, certainly; they're all a disaster.  I think I said that in the
 writeup at the top.  This is just proof of concept, nothing is in the
 right place, nothing is properly documented.  They have to be
 defcustoms, there needs to be a good :type in the defcustom as well as
 a proper docstring.  You'll get no argument from me about the lack (or
 inaccuracy) of docstrings and such.  I hadn't gotten that far yet.
 I said the patch was only if you wanted to tinker with the development
 as this progresses.

No worries, I was just making some comments before forgetting about
them.

 +(defun org-e-latex--quotation-marks (text info)
 +  (org-export-quotation-marks text info org-e-latex-quote-replacements))
 +  ;; (mapc (lambda(l)
 +  ;;  (let ((start 0))
 +  ;;(while (setq start (string-match (car l) text start))
 +  ;;  (let ((new-quote (concat (match-string 1 text) (cdr l
 +  ;;(setq text (replace-match new-quote  t t text))
 +  ;;(cdr (or (assoc (plist-get info :language) org-e-latex-quotes)
 +  ;; ;; Falls back on English.
 +  ;; (assoc en org-e-latex-quotes
 +  ;; text)
 Use directly `org-e-latex-quote-replacements' in code then.

 Not sure I understand this comment.

Since `org-e-latex--quotation-marks' just calls
`org-export-quotation-marks', you can remove completely the former from
org-export.el and use the latter instead.

 So... there's the filter-parse-tree-functions hook gets applied within
 the parse tree... so a back-end can add a function to that list which
 looks over the parse-tree and watches for these border cases (and also
 the ones within ordinary strings).  Looks like it's going to be tough
 to work in any flexibility to define further per-language or
 per-backend cleverness to handle anything beyond the canonical set
 of open-double, close-double, open-single, close-single, and mid-word.

 To be sure, anything we do will most assuredly fail even on some
 fairly reasonable input, in which case the users are pretty much on
 their own and will have to do things the hard way.  And I could use
 that as the answer here, that, well, it'll work only within
 plain-text strings (and I might possibly still have to use that
 answer), but I would rather include the situations you bring up in the
 supported set and not throw up my hands at it.  So, yes, will look at
 that.

Actually it isn't very hard to handle this problem. But it will be
different than the fontification used in an Org buffer.

You might want to look at `org-element-normalize-contents', which solves
a similar problem: removing maximum common indentation at the parsed
paragraph level.

As a first approximation, I can imagine a function accepting an element,
an object or a secondary string and returning an equivalent element,
object or secondary string, with its quotes smartified. The algorithm
could go like this:

Walk element/object/secondary-string's contents .

  1. When a string is encountered:

 1. If it has a quote as its first or last position, check for
objects before or after the string to guess its status. An
object never starts with a white space, but you may have to
check :post-blank property in order to know if previous object
had white spaces at its end.

 2. For each quote everywhere else in the string, your regexp can
handle it fine.

  2. When an object belonging to `org-element-recursive-objects' is
 encountered, apply the function to this object.

  3. Accumulate returned strings or objects.

Use accumulated data as the contents of the new object to return (i.e.
just add the type and the same properties at the beginning of this list
if it was an object or an element, return it as-is if that was
a secondary string).

On the elements side, only paragraphs, verse-blocks and table-rows can
directly contain quotes. Also, headline, inlinetask item and
footnote-reference have secondary strings containing quotes.

I'm not sure yet where and how to install such a function, but I will
think about it when it is implemented.


Regards,

-- 
Nicolas Goaziou



Re: [O] Smart Quotes Exporting

2012-06-01 Thread Mark E. Shoulson

On 06/01/2012 01:11 PM, Nicolas Goaziou wrote:

Hello,

Mark E. Shoulsonm...@kli.org  writes:


Oh, certainly; they're all a disaster.  I think I said that in the
writeup at the top.  This is just proof of concept, nothing is in the
right place, nothing is properly documented.  They have to be
defcustoms, there needs to be a good :type in the defcustom as well as
a proper docstring.  You'll get no argument from me about the lack (or
inaccuracy) of docstrings and such.  I hadn't gotten that far yet.
I said the patch was only if you wanted to tinker with the development
as this progresses.

No worries, I was just making some comments before forgetting about
them.


Ah, ok.  Good!  Thanks.


+(defun org-e-latex--quotation-marks (text info)
+  (org-export-quotation-marks text info org-e-latex-quote-replacements))
+  ;; (mapc (lambda(l)
+  ;; (let ((start 0))
+  ;;   (while (setq start (string-match (car l) text start))
+  ;; (let ((new-quote (concat (match-string 1 text) (cdr l
+  ;;   (setq text (replace-match new-quote  t t text))
+  ;;   (cdr (or (assoc (plist-get info :language) org-e-latex-quotes)
+  ;;;; Falls back on English.
+  ;;(assoc en org-e-latex-quotes
+  ;; text)
Use directly `org-e-latex-quote-replacements' in code then.

Not sure I understand this comment.

Since `org-e-latex--quotation-marks' just calls
`org-export-quotation-marks', you can remove completely the former from
org-export.el and use the latter instead.


Well, that was done on purpose, and maybe the reason will make sense.  
As I see it, each exporter should be able to have its own smartifier 
function, and the export engine should make no assumptions about that: 
just call the individual exporter's function.  On the other hand, many 
(but perhaps not all!) of the exporters may find themselves using 
essentially the same code just with different replacement strings.  So I 
thought that general-purpose should be in org-export.el, just for the 
convenience of exporters should they choose to make use of it.  So, many 
of the exporters' smartifier functions will really just be calls to the 
more general-purpose function.


Does that make sense?


So... there's the filter-parse-tree-functions hook gets applied within
the parse tree... so a back-end can add a function to that list which
looks over the parse-tree and watches for these border cases (and also
the ones within ordinary strings).  Looks like it's going to be tough
to work in any flexibility to define further per-language or
per-backend cleverness to handle anything beyond the canonical set
of open-double, close-double, open-single, close-single, and mid-word.

To be sure, anything we do will most assuredly fail even on some
fairly reasonable input, in which case the users are pretty much on
their own and will have to do things the hard way.  And I could use
that as the answer here, that, well, it'll work only within
plain-text strings (and I might possibly still have to use that
answer), but I would rather include the situations you bring up in the
supported set and not throw up my hands at it.  So, yes, will look at
that.

Actually it isn't very hard to handle this problem. But it will be
different than the fontification used in an Org buffer.
Yes, the fontification on-screen is different, and uses a rather 
different function--but if I can help it, the same regexps!  So things 
work the same everywhere.


I also started thinking a little about what you write below, how we can 
inspect the characters just after or before quotes at the very beginning 
or end of each chunk.  It would be nice if it could all be encapsulated 
neatly in the regexp(s).

As a first approximation, I can imagine a function accepting an element,
an object or a secondary string and returning an equivalent element,
object or secondary string, with its quotes smartified. The algorithm
could go like this:

Walk element/object/secondary-string's contents .


Need it be element/object/secondary-string?  At the bottom level it's 
always about strings; the higher levels don't affect the processing of 
each string in isolation.  Do we need to intercept it at the element 
level or just wait to grab things in the plain-text filter, since we 
have access at that point too?


(Might also be that my understanding of the process and the nature of 
elements is faulty or limited.  Will have to see what works.)




   1. When a string is encountered:

  1. If it has a quote as its first or last position, check for
 objects before or after the string to guess its status. An
 object never starts with a white space, but you may have to
 check :post-blank property in order to know if previous object
 had white spaces at its end.


Hmm, this may in fact answer my question above: you need to be able to 
get at the object level to test the post-blank.  I'll experiment.



  2. For each quote everywhere else in the string, 

Re: [O] Smart quotes

2012-05-29 Thread Nicolas Goaziou
Hello,

Mark E. Shoulson m...@kli.org writes:

 Maybe, if it's all on one line.  But if the quote is several lines
 long, can you sensibly count the levels?

Well, yes.

 I guess it doesn't actually matter, but it starts to get weird if you
 find yourself looking arbitrarily far back, and then you start
 building in exceptions for crossing paragraph boundaries...

True. I had the exporter in mind, where you always start at the
beginning of the paragraph. It would be more difficult with search
starting in the middle of the paragraph.

 And then there's the fact that multi-paragraph quotes usually have an
 open-quote for each paragraph but only one close-quote at the end...

Some french typographers suggest to use a close-quote at the beginning
of the paragraph to avoid that confusion, or to simply drop them (since
they are a pain to maintain anyway). I don't know about other languages
but, if that's the same, is it a good idea to bother implementing it?

 Actually keeping count of what level you're at, accurately, is
 a classic example of a non-regular language; you need a push-down
 automaton to keep count, and regular expressions don't cut it.

This is limited to 2 levels.

 I'm rambling.  In sum, I'm going to start off /not/ trying to solve
 that problem, and assume the writer is going to use alternating  and
 as typography requires and not try to second-guess what level we're
 at.

You are right, the problem will be easier to solve with both  and '.

Though, as typography requires is not true. In France, the /Imprimerie
Nationale/ suggests to use guillemots at both levels. Remember that
typography is localized, which is the main difficulty of the
implementation.


Regards,

-- 
Nicolas Goaziou



Re: [O] Smart quotes

2012-05-29 Thread Mark E. Shoulson

On 05/29/2012 01:57 PM, Nicolas Goaziou wrote:

Hello,

Mark E. Shoulsonm...@kli.org  writes:



I guess it doesn't actually matter, but it starts to get weird if you
find yourself looking arbitrarily far back, and then you start
building in exceptions for crossing paragraph boundaries...

True. I had the exporter in mind, where you always start at the
beginning of the paragraph. It would be more difficult with search
starting in the middle of the paragraph.


Maybe the on-screen stuff is no harder; will just have to see.


And then there's the fact that multi-paragraph quotes usually have an
open-quote for each paragraph but only one close-quote at the end...

Some french typographers suggest to use a close-quote at the beginning
of the paragraph to avoid that confusion, or to simply drop them (since
they are a pain to maintain anyway). I don't know about other languages
but, if that's the same, is it a good idea to bother implementing it?


I've never heard of it.  But I think we may be overthinking this; we can 
drive ourselves crazy trying to compress a dozen different typographical 
traditions (and informal customs) into a few Elisp rules.  On the other 
hand, I don't think we need to throw up our hands and give up either! :)



Actually keeping count of what level you're at, accurately, is
a classic example of a non-regular language; you need a push-down
automaton to keep count, and regular expressions don't cut it.

This is limited to 2 levels.

True.

I'm rambling.  In sum, I'm going to start off /not/ trying to solve
that problem, and assume the writer is going to use alternating  and
as typography requires and not try to second-guess what level we're
at.

You are right, the problem will be easier to solve with both  and '.

Though, as typography requires is not true. In France, the /Imprimerie
Nationale/ suggests to use guillemots at both levels. Remember that
typography is localized, which is the main difficulty of the
implementation.


Also a good point.

All right, bottom line, this is sort of what I'm seeing.  I'm not 100% 
sure which files should house these things, but something like this:


1) a variable containing for each language regexp for each of: open 
double-quote, close double-quote, open single-quote, close single-quote, 
and maybe mid-word apostrophe.  Odds are these regexps are going to be 
the same for just about all languages (the regexps detecting them, mind 
you), so probably should have some sort of default that the alist can 
just reference.  A language should also be allowed to define other quote 
regexps in its list too.  We need these to be ordered, with a standard 
set, so that we can have...


2) for each *exporter* (including on-screen display), a variable that 
defines, for each language, what the *substitution* will be for 
open-double-quote, close-double-quote, etc.  Other extras can be defined 
too.  That way we can have an exporter-independent way to detect quotes 
to be smartified, but each exporter has its own way to smartify them.


3) Since most exporters are probably going to be handling doing the 
process approximately the same (match the regexp, stick in the 
associated substitution), org-export.el should have a generic function 
that does this which each exporter *may* call in (or as) its 
quote-smartifier in its text translator, unless it needs something more 
specific which it can provide itself.


In terms of what is handled, the idea in my head is that we would expect 
the writer to be using  or ' to surround their quotes, regardless of 
what their native custom is (if they're doing it using their 
language-specific quote-marks, we don't need to bother with all this 
anyway).  Goal is to handle either quotes or 'quotes' in either 
nesting (or no nesting, if someone does quote' for some reason), and 
with any luck not get too confused with other uses of apostrophe.


It makes sense to me, but I bet I explained it badly and people are 
going to have all kinds of issues with it. :)


No telling when (if?) I'll be able to produce something along these 
lines, but it's something to start thinking about anyway.


~mark



Re: [O] Smart quotes

2012-05-28 Thread Mark E. Shoulson

On 05/26/2012 02:48 AM, Nicolas Goaziou wrote:

Hello,

Mark E. Shoulsonm...@kli.org  writes:


The regexp may be able to tell level 1 from level 2 quotes.

Do you mean that the author would use the same characters for both
first and second level quotes, and the regexp would be smart enough to
distinguish which level each was at?  I don't think that's possible,
and you probably don't either.

Actually, I do. Since you can tell an opening quote from a closing one
by the position of the white space (or parenthesis, beginning/end of
line) near it, I think you can deduce the quote level. I may be wrong,
though.


Maybe, if it's all on one line.  But if the quote is several lines long, 
can you sensibly count the levels?  I guess it doesn't actually matter, 
but it starts to get weird if you find yourself looking arbitrarily far 
back, and then you start building in exceptions for crossing paragraph 
boundaries... And then there's the fact that multi-paragraph quotes 
usually have an open-quote for each paragraph but only one close-quote 
at the end... Actually keeping count of what level you're at, 
accurately, is a classic example of a non-regular language; you need a 
push-down automaton to keep count, and regular expressions don't cut 
it.  Then again, Emacs regexps are more powerful than simple regular 
expressions, and we only would want to keep track of even vs odd level 
anyway.


I'm rambling.  In sum, I'm going to start off /not/ trying to solve that 
problem, and assume the writer is going to use alternating  and ' as 
typography requires and not try to second-guess what level we're at.  As 
that progresses, maybe I'll come to understand better what can and can't 
(and should and shouldn't) be deduced by the regexps.



this is a 'quote', and that's all you need to know.

becoming, for instance

«this is a ‹quote›, and that’s all you need to know.»

this is a quote, and that's all you need to know is as parsable to
me.

As a side note, at least in French, many typographers would recommend
this is a /quote/, and that's all you need to know here. Oh, and
I know that was just an example.


I see; because I can tell that the second  must be an open-quote and 
not closing the first, due to its position relative to the spaces.  It 
does seem possible, but I think I'm going to try not solving that 
problem first.


(And French typography raises other problems, since French puts lots of 
space around the quote-marks, to the extent that French typists typing 
plain-text will often put a space on both sides of a quote-mark, making 
it hard to see whether it opens or closes... another issue, not 
necessarily solvable, to watch for.)


~mark





Re: [O] Smart quotes

2012-05-26 Thread Nicolas Goaziou
Hello,

Mark E. Shoulson m...@kli.org writes:

 The regexp may be able to tell level 1 from level 2 quotes.

 Do you mean that the author would use the same characters for both
 first and second level quotes, and the regexp would be smart enough to
 distinguish which level each was at?  I don't think that's possible,
 and you probably don't either.

Actually, I do. Since you can tell an opening quote from a closing one
by the position of the white space (or parenthesis, beginning/end of
line) near it, I think you can deduce the quote level. I may be wrong,
though.

 this is a 'quote', and that's all you need to know.

 becoming, for instance

 «this is a ‹quote›, and that’s all you need to know.»

this is a quote, and that's all you need to know is as parsable to
me.

As a side note, at least in French, many typographers would recommend
this is a /quote/, and that's all you need to know here. Oh, and
I know that was just an example.

 I'd love to get org more export-friendly.  I'll see what I can
 understand of the (new) export code.

Do not hesitate to ask questions about it.


Regards,

-- 
Nicolas Goaziou



Re: [O] Smart quotes

2012-05-25 Thread Nicolas Goaziou
Hello,

Mark E. Shoulson m...@kli.org writes:

 Hm.  I like the idea, but it raises some questions for me.  It would
 be particularly good if this could share code/custom variables with
 the pieces of the (new) exporter that make smart quotes on export.
 That way we could be sure that what it looks like onscreen would also
 be what it looked like when exported.

I could be interesting, but keep in mind that no matter how smart your
quotes are, they will fail in some situations. So, it will have to be
optional for export, independently on their in-buffer status.

The OPTIONS keyword may be used, with q:t and q:nil items.

 Looking at contrib/lisp/org-e-latex.el at an upcoming exporter for
 such things, I see a variable org-e-latex-quotes, which has nice
 language-aware parts... but misses an important point.  Each language
 gets to define one regexp for opening quotes, one for closing quotes,
 and one for single quotes.  But don't we want to talk about (at least)
 two levels of quotes, see your own reference[fn:1]?

Probably. But that's going to be somewhat harder.

 Single quotes would be for inner, second-level quotes (if we're using
 double straight quotes according to (American) English usage, I would
 guess we'd be using single straight quotes the same way).  That works
 okay for English, where a single apostrophe not part of a grouping
 construct is going to be interpreted as a close single quote and
 look right for an apostrophe.

The regexp may be able to tell level 1 from level 2 quotes.

 It might not work so good in French where apostrophes are also used,

There are no spaces around apostrophes, so they shouldn't be caught by
the regexp.

 but also single guillemets for inner-level quotes.

What are single guillemets? I don't think there is such thing in French.

 Should/can we consider extending this for the new exporters?

I think it would be a good addition to the export mechanism, if you want
to give it a try.

 (I'm looking forward to HTML and ODT exporters that can do smart
 quotes; the straight quotes are really the main jarring things about
 using Org as a lightweight markup and exporting into something
 fancier)

A function, provided in org-export, could help changing dumb quotes into
smart quotes in plain text. Then, it would be easier for back-ends to
provide the feature, if they wanted to.


Regards,

-- 
Nicolas Goaziou



Re: [O] Smart quotes

2012-05-25 Thread Jambunathan K

 I could be interesting, but keep in mind that no matter how smart your
 quotes are, they will fail in some situations. So, it will have to be
 optional for export, independently on their in-buffer status.

 The OPTIONS keyword may be used, with q:t and q:nil items.

I don't see an entry for this in `org-export-options-alist'.  So I
believe you are soliciting opinion on a fresh addition.

 (I'm looking forward to HTML and ODT exporters that can do smart
 quotes; the straight quotes are really the main jarring things about
 using Org as a lightweight markup and exporting into something
 fancier)

 A function, provided in org-export, could help changing dumb quotes into
 smart quotes in plain text. Then, it would be easier for back-ends to
 provide the feature, if they wanted to.

I can use it, if made available.  I think, It will be help if we force
all exporters to produce utf-8 files.
-- 



Re: [O] Smart quotes

2012-05-25 Thread Mark E. Shoulson

On 05/25/2012 01:14 PM, Nicolas Goaziou wrote:

Hello,

Mark E. Shoulsonm...@kli.org  writes:


Hm.  I like the idea, but it raises some questions for me.  It would
be particularly good if this could share code/custom variables with
the pieces of the (new) exporter that make smart quotes on export.
That way we could be sure that what it looks like onscreen would also
be what it looked like when exported.

I could be interesting, but keep in mind that no matter how smart your
quotes are, they will fail in some situations. So, it will have to be
optional for export, independently on their in-buffer status.

The OPTIONS keyword may be used, with q:t and q:nil items.


Smart quotes absolutely have to be optional, and probably disabled by 
default.  They're going to fail sometimes, so they should only be there 
when you ask for them.  Smart-quotes-for-export and 
smart-quotes-onscreen need to be settable independently, yes.  
Smart-quotes-for-export needs to be settable per-file/per-buffer, with 
OPTIONS or something.  Smart-quotes-onscreen doesn't have to be 
buffer-local, though it might be a good idea.  Using q:t or maybe :t in 
options seems perfectly good for setting exporting smart quotes.  It 
still would be good if onscreen and export could share code.



Looking at contrib/lisp/org-e-latex.el at an upcoming exporter for
such things, I see a variable org-e-latex-quotes, which has nice
language-aware parts... but misses an important point.  Each language
gets to define one regexp for opening quotes, one for closing quotes,
and one for single quotes.  But don't we want to talk about (at least)
two levels of quotes, see your own reference[fn:1]?

Probably. But that's going to be somewhat harder.


Single quotes would be for inner, second-level quotes (if we're using
double straight quotes according to (American) English usage, I would
guess we'd be using single straight quotes the same way).  That works
okay for English, where a single apostrophe not part of a grouping
construct is going to be interpreted as a close single quote and
look right for an apostrophe.

The regexp may be able to tell level 1 from level 2 quotes.


Do you mean that the author would use the same characters for both first 
and second level quotes, and the regexp would be smart enough to 
distinguish which level each was at?  I don't think that's possible, and 
you probably don't either.  What I meant, and you probably did as well, 
was that if we use apostrophes for second-level quotes, a regexp can be 
smart enough to tell the difference between a second-level quote and a 
non-quote apostrophe



It might not work so good in French where apostrophes are also used,

There are no spaces around apostrophes, so they shouldn't be caught by
the regexp.


which is what you say here.  They *should* be caught by a regexp, but 
not the same one; they need to be smartified also, just not necessarily 
treated the same as second-level quotes.



but also single guillemets for inner-level quotes.

What are single guillemets? I don't think there is such thing in French.


You're right; the Wikipedia page says that French uses quote-marks or 
the same double-chevrons for inner quotes.  I thought it used \lsaquo 
and \rsaquo, « like ‹ this › ».  Looks like it does in Swiss typography 
for various languages, according to the page.  Danish also uses the 
single-chevrons (pointing the other direction), and Azerbaijani and 
Basque, etc... Whatever.  What I meant was, if people are going to be 
writing using straight ascii quotes and expect them to be changed into 
language-appropriate quotes, they're going to want something like


this is a 'quote', and that's all you need to know.

becoming, for instance

«this is a ‹quote›, and that’s all you need to know.»

that is, it should be possible to use the single quotes for inner 
quotes, which would mean more than just opening/closing/single in the 
org-e-latex-quotes (and analogous variables in other exporters).  Being 
able to determine when you need ‹› and when ’ might be a little 
uncertain, but it isn't hard to make a regexp that can make a decent 
guess at it.



Should/can we consider extending this for the new exporters?

I think it would be a good addition to the export mechanism, if you want
to give it a try.


I'd love to get org more export-friendly.  I'll see what I can 
understand of the (new) export code.



(I'm looking forward to HTML and ODT exporters that can do smart
quotes; the straight quotes are really the main jarring things about
using Org as a lightweight markup and exporting into something
fancier)

A function, provided in org-export, could help changing dumb quotes into
smart quotes in plain text. Then, it would be easier for back-ends to
provide the feature, if they wanted to.
That sounds like a possibility, might make for good generic handling, 
only one bit of code to treat everything consistently... yeah, I didn't 
like the idea at first, I'm starting to like it more.  

Re: [O] Smart quotes

2012-05-23 Thread Nicolas Goaziou
Hello,


Mark E. Shoulson m...@kli.org writes:

 Smart quotes can be annoying when they aren't smart enough. But when
 they work you can miss them. I'm attaching a patch that defines a
 custom variable org-smart-quotes (nil by default), which when non-nil
 causes the  and ' characters to display as “smart” quotes, hopefully
 the right ones. They're still ' and  in the underlying text, just
 overlaid with “”.

This is not related to entities, so code shouldn't be in org-entities.el.

Also, quotes are dependent on locale[fn:1]. English/US only quotes look
like a niche to me. Would it be possible to modify the patch and have
this feature handle LANGUAGE keyword, or at least have a support for it?


Regards,

[fn:1] https://en.wikipedia.org/wiki/Non-English_usage_of_quotation_marks

-- 
Nicolas Goaziou



Re: [O] Smart quotes

2012-05-23 Thread Mark E. Shoulson

On 05/23/2012 06:17 PM, Nicolas Goaziou wrote:

Hello,


Mark E. Shoulsonm...@kli.org  writes:


Smart quotes can be annoying when they aren't smart enough. But when
they work you can miss them. I'm attaching a patch that defines a
custom variable org-smart-quotes (nil by default), which when non-nil
causes the  and ' characters to display as “smart” quotes, hopefully
the right ones. They're still ' and  in the underlying text, just
overlaid with “”.

This is not related to entities, so code shouldn't be in org-entities.el.

Agreed.



Also, quotes are dependent on locale[fn:1]. English/US only quotes look
like a niche to me. Would it be possible to modify the patch and have
this feature handle LANGUAGE keyword, or at least have a support for it?
Hm.  I like the idea, but it raises some questions for me.  It would be 
particularly good if this could share code/custom variables with the 
pieces of the (new) exporter that make smart quotes on export.  That way 
we could be sure that what it looks like onscreen would also be what it 
looked like when exported.  Looking at contrib/lisp/org-e-latex.el at an 
upcoming exporter for such things, I see a variable org-e-latex-quotes, 
which has nice language-aware parts... but misses an important point.  
Each language gets to define one regexp for opening quotes, one for 
closing quotes, and one for single quotes.  But don't we want to talk 
about (at least) two levels of quotes, see your own reference[fn:1]?  
Single quotes would be for inner, second-level quotes (if we're using 
double straight quotes according to (American) English usage, I would 
guess we'd be using single straight quotes the same way).  That works 
okay for English, where a single apostrophe not part of a grouping 
construct is going to be interpreted as a close single quote and look 
right for an apostrophe.  It might not work so good in French where 
apostrophes are also used, but also single guillemets for inner-level 
quotes.  Does the setup there need to be smarter, or at least more 
extensible, to allow for more than exactly three entries?  Clever enough 
regexps could distinguish inner quotes from apostrophes, etc.  
Should/can we consider extending this for the new exporters?


(I'm looking forward to HTML and ODT exporters that can do smart quotes; 
the straight quotes are really the main jarring things about using Org 
as a lightweight markup and exporting into something fancier)


~mark