Re: [O] Smart quotes not working with some languages?
Thanks! I prepared a patch as you've suggested (attached). On Thu, Nov 14, 2013 at 3:15 AM, Rasmus ras...@gmx.us wrote: Hi Daniil, Daniil Frumin difru...@gmail.com writes: Hi! I am using the latest org from git and I can't get org to export (I need LaTeX export particularly) files with smart quotes. You need to add it to the variable org-export-smart-quotes-alist defined in ox.el. Put your cursor on the variable and do C-h v to read about it. Once you have added support for Russian, please make a patch and submit it here so that other people can benefit from your labor. Thanks, Rasmus -- Dung makes an excellent fertilizer -- Sincerely yours, -- Daniil 0001-Adding-support-for-russian-to-smart-quotes-exporter.patch Description: Binary data
Re: [O] Smart quotes not working with some languages?
Caio Daniil, Daniil Frumin difru...@gmail.com writes: I prepared a patch as you've suggested (attached). Are the unicode entities correct? In my UTF8-Emacs they look like, e.g, \302\253. It could be OK, though and I could just be missing a font that has these glyphs. Also, please change the commit message as described here http://orgmode.org/worg/org-contribute.html#sec-5 (Hint: git rebase -i origin/master (probably you already know more git then me, but in the off chance that you don't I've included a suitable command)). If you haven't signed FSF papers you should end the patch with TINYCHANGE. When resubmitting append your message with [PATCH]. Thanks again, Rasmus -- ⠠⠵
Re: [O] Smart quotes not working with some languages?
Hi Daniil, Daniil Frumin difru...@gmail.com writes: Hi! I am using the latest org from git and I can't get org to export (I need LaTeX export particularly) files with smart quotes. You need to add it to the variable org-export-smart-quotes-alist defined in ox.el. Put your cursor on the variable and do C-h v to read about it. Once you have added support for Russian, please make a patch and submit it here so that other people can benefit from your labor. Thanks, Rasmus -- Dung makes an excellent fertilizer
Re: [O] Smart Quotes Exporting
Hi Mark and Nicolas, in the patchwork¹, I've marked patches² related to this discussion as Not Applicable. If there are progress made on this front, please send updated patches. If there is a patch below that I should apply, please let me know. Thanks! ¹ http://patchwork.newartisans.com/project/org-mode/list/ ² Here are the patches: http://patchwork.newartisans.com/patch/1330/ http://patchwork.newartisans.com/patch/1344/ http://patchwork.newartisans.com/patch/1346/ http://patchwork.newartisans.com/patch/1348/ -- Bastien
Re: [O] Smart Quotes Exporting
Hello, Mark Shoulson m...@kli.org writes: Well, wait; regexps can make some pretty darn good guesses at the beginnings or ends of strings. I know that. They make a good job, I just want a better one. This isn't quite it; beginning-of-string followed by quote, then punctuation and then spaces is also a close-quote, etc... There is a lot of fine-tuning. But even what I currently have was able to handle your Caesar said, /Alea Jacta est./ example. No, it doesn't handle that, actually, it's just sheer luck. Indeed, the quoting function is applied to \. There's absolutely no space, punctuation, etc. to save the day. So it makes a wild guess with a probability of 0.5 of success. Since the guess is always the same, /a/ will always fail. The case where a quote both sits at the edge of a string (i.e. at the border of some element, formatting, etc) *and* does not have whitespace next to it, with possible punctuation, does not seem to be a normal occurrence to me. If I'm wrong, how common *is* it? Even if it rarely happens, it can be _very_ annoying to have to cope with bad guesses. If it can be avoided, I see no reason not to do so. Now, here the infrastructure I propose. Internally, the two following functions are required. #+begin_src emacs-lisp (defun org-export--smart-quotes-in-element (element backend) Replace plain quotes with smart quotes in ELEMENT. ELEMENT is an Org element or a secondary string. BACKEND is the back-end to check for rules, as a symbol. This is a destructive operation. Return new element. (let* ((type (org-element-type element)) (properties (and type (nth 1 element ;; Destructively apply changes to secondary string, if any. (let ((secondary (and type (assq type org-element-secondary-value-alist (when secondary (let* ((sec-symbol (cdr secondary)) (sec-value (plist-get properties sec-symbol))) (when sec-value (setq properties (plist-put properties sec-symbol (org-export--smart-quotes-in-element sec-value backend))) ;; Destructively change `:caption' if present. Since it's a dual ;; keyword, apply smart quotes to both CAR and CDR, if required. (let ((caption (plist-get :caption properties))) (when caption (setq properties (plist-put properties :caption (cons (org-export--smart-quotes-in-element (car caption) backend) (and (cdr caption) (org-export--smart-quotes-in-element (cdr caption) backend))) ;; Recursively apply changes to contents. Rebuild ELEMENT along ;; the way, with updated strings. (let ((contents (if type (org-element-contents element) element)) previous current next acc) (while contents (setq current (pop contents) next (car contents) previous current) (push (cond ((stringp current) ;; CURRENT is a string: Call ;; `org-export-quotation-marks' with appropriate ;; information. (org-export-quotation-marks current (and previous (if (stringp previous) (length (and (string-match +\\' previous) (match-string 0 previous))) (org-element-property :post-blank previous))) (and next (if (not (stringp next)) 0 (length (and (string-match \\` + next) (match-string 0 next) backend)) ;; CURRENT is recursive: Move into it. ((plist-get properties :contents-begin) (org-export--smart-quotes-in-element current backend)) ;; Otherwise, just accumulate CURRENT. (t current)) acc)) ;; Re-build transformed element. (if (or (not type) (eq type 'plain-text)) (nreverse acc) (nconc (list type properties) (nreverse acc)) (defun org-export-set-smart-quotes (tree backend info) Replace plain quotes with smart quotes in TREE. BACKEND is the back-end, as a symbol, used for transcoding. INFO is a plist used as a communication channel. This is a destructive operation. This function is meant to be used as a parse tree filter for back-ends activating smart quotes. ;; Destructively apply smart quotes to parsed keywords in info. (let ((value (plist-get info :title))) (when value (setq info (plist-put info :title (org-export--smart-quotes-in-element value backend) ;;
Re: [O] Smart Quotes Exporting
Nicolas Goaziou n.goaziou at gmail.com writes: Hello, Mark Shoulson mark at kli.org writes: ASCII exporter also handle UTF-8. So it's good to have there too. Really? I would have thought ASCII meant ASCII, as in 7-bit clean text. org-e-ascii.el (as old org-ascii.el) handles ASCII, Latin1 and UTF-8 encodings. I noticed that after writing my response. The name just threw me a little. Yes, that exporter needs to handle it too. It looked to me like your solution would essentially boil down to do string handling when there's a string, otherwise recur down and find the strings, which essentially means apply it to all the strings... and there were already functions out there applying things to strings, so this can just ride along with them. Here, let's look at your suggestion and see if we can find what I missed: So, if it's a string, use the regexps (if they can be smart enough to look at beginning and end of the string, which they can--though I haven't been using the :post-blank property so presumably something is amiss), and if it isn't a string, recur down until you get to a string... Ah, but only if it's in org-element-recursive-objects. You're missing an important part: the regexps cannot be smart enough for quotes at the beginning or the end of the string. There, you must look outside the string. Hence: Well, wait; regexps can make some pretty darn good guesses at the beginnings or ends of strings. Quotations don't normally end in spaces (in the conventions used with ; French typography is different, but if you're using spaces around your quotes you have worse problems (line-breaks) to worry about). So if a string ends in space(s) followed by a quote, it's very likely that quote is an open-quote for some stuff that comes after. Conversely, if a string starts with a quote followed by some spaces, it's very likely a close- quote to what went on before. This isn't quite it; beginning-of-string followed by quote, then punctuation and then spaces is also a close-quote, etc... There is a lot of fine-tuning. But even what I currently have was able to handle your Caesar said, /Alea Jacta est./ example. Yes, there are edge-cases which this won't catch, and it remains to be seen how pervasive and annoying those are. It may be that repeated tweaking of regexps will handle enough of the ordinary cases. It may be that after a few rounds of regexp-hacking someone will finally decide that regexp- hacking just won't handle enough of the important cases. But I think even as it stands now we'd probably handle 80-90% of the normal situations, which really is as much as we reasonably can hope for. Could I trouble someone to try applying my patch and trying it out for yourself and seeing just how bad/good the performance is? It seems to work okay for the cases I've been trying, but maybe my dataset isn't robust enough. Let's give it a test and seen how many actual cases in common usage it gets wrong. Maybe see how much can be fixed by tuning regexps. ] 1. If it has a quote as its first or last position, check for ] objects before or after the string to guess its status. An ] object never starts with a white space, but you may have to ] check :post-blank property in order to know if previous object ] had white spaces at its end. But you can only do that from the element containing the string, not from the string itself. The case where a quote both sits at the edge of a string (i.e. at the border of some element, formatting, etc) *and* does not have whitespace next to it, with possible punctuation, does not seem to be a normal occurrence to me. If I'm wrong, how common *is* it? So the issue with the current state is that it would wind up applying to too much? (it would hit code and verbatim elements, for example, and that would be wrong.) No, you are not applying it too much (verbatim elements don't contain plain-text objects) but your function hasn't got access to enough information to be useful. The on-screen version, of course, will have to be smarter and check for the face formatting to make sure it doesn't happen in comments or verbatims; I am pretty sure it does not do that yet. wait, called on the top-level parsed tree object, recursively doing its thing before(?) the transcoders of the individual objects get to it. That's called a parse tree filter. That should be a possibility indeed. The function would be applied on the parse tree and would replace strings within elements containing plain text (that is paragraph, verse-block and table-row types). parse tree filters are applied very early in the export process. Another option would be to integrate it into `org-element-normalize-contents', but I think the previous way is better. Maybe. I know it sounds like I'm fixated on the plain-text solution, but I'm not convinced the
Re: [O] Smart Quotes Exporting
Hello, Mark Shoulson m...@kli.org writes: ASCII exporter also handle UTF-8. So it's good to have there too. Really? I would have thought ASCII meant ASCII, as in 7-bit clean text. org-e-ascii.el (as old org-ascii.el) handles ASCII, Latin1 and UTF-8 encodings. It looked to me like your solution would essentially boil down to do string handling when there's a string, otherwise recur down and find the strings, which essentially means apply it to all the strings... and there were already functions out there applying things to strings, so this can just ride along with them. Here, let's look at your suggestion and see if we can find what I missed: ] Walk element/object/secondary-string's contents . ] ] 1. When a string is encountered: ] ] 1. If it has a quote as its first or last position, check for ] objects before or after the string to guess its status. An ] object never starts with a white space, but you may have to ] check :post-blank property in order to know if previous object ] had white spaces at its end. ] ] 2. For each quote everywhere else in the string, your regexp can ] handle it fine. ] ] 2. When an object belonging to `org-element-recursive-objects' is ] encountered, apply the function to this object. ] ] 3. Accumulate returned strings or objects. So, if it's a string, use the regexps (if they can be smart enough to look at beginning and end of the string, which they can--though I haven't been using the :post-blank property so presumably something is amiss), and if it isn't a string, recur down until you get to a string... Ah, but only if it's in org-element-recursive-objects. You're missing an important part: the regexps cannot be smart enough for quotes at the beginning or the end of the string. There, you must look outside the string. Hence: ] 1. If it has a quote as its first or last position, check for ] objects before or after the string to guess its status. An ] object never starts with a white space, but you may have to ] check :post-blank property in order to know if previous object ] had white spaces at its end. But you can only do that from the element containing the string, not from the string itself. So the issue with the current state is that it would wind up applying to too much? (it would hit code and verbatim elements, for example, and that would be wrong.) No, you are not applying it too much (verbatim elements don't contain plain-text objects) but your function hasn't got access to enough information to be useful. So it remains to find the right place in the processing to put a function like the one you describe. I'm trying to get a proper understanding of the code structure to see what you mean. Looks like it should be something like a transcoder, only called on everything... Transcoders are type specific, so that's not an option. wait, called on the top-level parsed tree object, recursively doing its thing before(?) the transcoders of the individual objects get to it. That's called a parse tree filter. That should be a possibility indeed. The function would be applied on the parse tree and would replace strings within elements containing plain text (that is paragraph, verse-block and table-row types). parse tree filters are applied very early in the export process. Another option would be to integrate it into `org-element-normalize-contents', but I think the previous way is better. The on-screen one would still use the plain-string computation, as you said, since the full parse isn't available. Yes. It would also need to be tweaked not to act on verbatim/comment text, etc. Yes. You may want to use `org-element-at-point' and `org-element-type' to tell if you're somewhere smart quotes are allowed (in table, table-row, paragraph, verse-block elements). Regards, -- Nicolas Goaziou
Re: [O] Smart Quotes Exporting
Nicolas Goaziou n.goaziou at gmail.com writes: Hello, Mark E. Shoulson mark at kli.org writes: Update on the smart-quotes patch. Supports the odt exporter now too, which I think covers all the current major new exporters for which it is relevant (adding smart quotes to ASCII export is a contradiction in terms; ASCII exporter also handle UTF-8. So it's good to have there too. Really? I would have thought ASCII meant ASCII, as in 7-bit clean text. More of a plain text exporter then. Fair enough. I'll work it in. should it be in the publish exporter? It didn't look like it to me). No. OK, good. Added an options keyword, '' (that is, the double-quote mark) to select smart quotes on/off, and a defcustom for customizing your default. Set the default default [sic] to nil, though actually it might be reasonable to set it to t. Slight touch-up to the regexps since last time, but they will definitely be subject to a lot of fine-tuning as more special cases are found that break them and ways to fix it are found (the close-quote still breaks on one of /a/. or /a./) Again, using regexps on plain text objects is a wrong approach, as you need a better understanding of the whole paragraph structure to properly. I already suggested a possible solution, is there anything wrong with it? It looked to me like your solution would essentially boil down to do string handling when there's a string, otherwise recur down and find the strings, which essentially means apply it to all the strings... and there were already functions out there applying things to strings, so this can just ride along with them. Here, let's look at your suggestion and see if we can find what I missed: ] Walk element/object/secondary-string's contents . ] ] 1. When a string is encountered: ] ] 1. If it has a quote as its first or last position, check for ] objects before or after the string to guess its status. An ] object never starts with a white space, but you may have to ] check :post-blank property in order to know if previous object ] had white spaces at its end. ] ] 2. For each quote everywhere else in the string, your regexp can ] handle it fine. ] ] 2. When an object belonging to `org-element-recursive-objects' is ] encountered, apply the function to this object. ] ] 3. Accumulate returned strings or objects. So, if it's a string, use the regexps (if they can be smart enough to look at beginning and end of the string, which they can--though I haven't been using the :post-blank property so presumably something is amiss), and if it isn't a string, recur down until you get to a string... Ah, but only if it's in org-element-recursive-objects. So the issue with the current state is that it would wind up applying to too much? (it would hit code and verbatim elements, for example, and that would be wrong.) And detecting such things at the string level would be the wrong place... So it remains to find the right place in the processing to put a function like the one you describe. I'm trying to get a proper understanding of the code structure to see what you mean. Looks like it should be something like a transcoder, only called on everything... wait, called on the top-level parsed tree object, recursively doing its thing before(?) the transcoders of the individual objects get to it. So almost something replacing the (lambda (blob contents info) contents) stub in org-export-transcoder; does that make sense to you? Otherwise, called somehow in org-export-data. In either case made a hook of some kind so that it is backend-specific. Does it sound like I am understanding this right, to you? The on-screen one would still use the plain-string computation, as you said, since the full parse isn't available. And that seems to work okay (the export works okay too, for simple cases.) It would also need to be tweaked not to act on verbatim/comment text, etc. Thanks, ~mark
Re: [O] Smart Quotes Exporting
Hello, Mark E. Shoulson m...@kli.org writes: Update on the smart-quotes patch. Supports the odt exporter now too, which I think covers all the current major new exporters for which it is relevant (adding smart quotes to ASCII export is a contradiction in terms; ASCII exporter also handle UTF-8. So it's good to have there too. should it be in the publish exporter? It didn't look like it to me). No. Added an options keyword, '' (that is, the double-quote mark) to select smart quotes on/off, and a defcustom for customizing your default. Set the default default [sic] to nil, though actually it might be reasonable to set it to t. Slight touch-up to the regexps since last time, but they will definitely be subject to a lot of fine-tuning as more special cases are found that break them and ways to fix it are found (the close-quote still breaks on one of /a/. or /a./) Again, using regexps on plain text objects is a wrong approach, as you need a better understanding of the whole paragraph structure to properly. I already suggested a possible solution, is there anything wrong with it? Regards, -- Nicolas Goaziou
Re: [O] Smart Quotes Exporting
Update on the smart-quotes patch. Supports the odt exporter now too, which I think covers all the current major new exporters for which it is relevant (adding smart quotes to ASCII export is a contradiction in terms; should it be in the publish exporter? It didn't look like it to me). Added an options keyword, '' (that is, the double-quote mark) to select smart quotes on/off, and a defcustom for customizing your default. Set the default default [sic] to nil, though actually it might be reasonable to set it to t. Slight touch-up to the regexps since last time, but they will definitely be subject to a lot of fine-tuning as more special cases are found that break them and ways to fix it are found (the close-quote still breaks on one of /a/. or /a./) It's pretty good on the whole, though, usually guesses right. I know there's some work being done on the odt exporter; hope this fits in well with it. How does it look to you? ~mark From e6df2efd1a9ce36964a20fc06aa2a688acd87efb Mon Sep 17 00:00:00 2001 From: Mark Shoulson m...@kli.org Date: Tue, 29 May 2012 23:01:12 -0400 Subject: [PATCH] Add `smart' quotes for onscreen display and for latex and html export * lisp/org.el: Add `smart' quotes: custom variables to define regexps to recognize quotes, to define how and whether to display them, and org-fontify-quotes to display `smart-quote' characters when activated. * contrib/lisp/org-export.el: Add function org-export-quotation-marks as a utility function usable by individual exporters to apply `smart' quotes. Also add keyword '' for customizing smart quotes, and custom default for it. * contrib/lisp/org-e-latex.el: Replace org-e-latex-quotes custom with org-e-latex-quotes-replacements and make org-e-latex--quotation-marks use the org-export-quotation-marks function in org-export.el. * contrib/lisp/org-e-html.el: Replace org-e-html-quotes custom with org-e-html-quotes-replacements and enable org-e-html--quotation-marks, using org-export-quotation-marks function in org-export.el. * contrib/lisp/org-e-odt.el: Replace org-e-odt-quotes custom with org-e-odt-quotes-replacements and make org-e-odt--quotation-marks use org-export-quotations-marks function in org-export.el. --- contrib/lisp/org-e-html.el | 57 contrib/lisp/org-e-latex.el | 67 ++--- contrib/lisp/org-e-odt.el | 68 ++--- contrib/lisp/org-export.el | 38 lisp/org.el | 101 +++ 5 files changed, 203 insertions(+), 128 deletions(-) diff --git a/contrib/lisp/org-e-html.el b/contrib/lisp/org-e-html.el index 4287a59..c49608d 100644 --- a/contrib/lisp/org-e-html.el +++ b/contrib/lisp/org-e-html.el @@ -1043,37 +1043,24 @@ in order to mimic default behaviour: Plain text -(defcustom org-e-html-quotes - '((fr - (\\(\\s-\\|[[(]\\|^\\)\ . «~) - (\\(\\S-\\)\ . ~») - (\\(\\s-\\|(\\|^\\)' . ')) -(en - (\\(\\s-\\|[[(]\\|^\\)\ . ``) - (\\(\\S-\\)\ . '') - (\\(\\s-\\|(\\|^\\)' . `))) - Alist for quotes to use when converting english double-quotes. - -The CAR of each item in this alist is the language code. -The CDR of each item in this alist is a list of three CONS: -- the first CONS defines the opening quote; -- the second CONS defines the closing quote; -- the last CONS defines single quotes. - -For each item in a CONS, the first string is a regexp -for allowed characters before/after the quote, the second -string defines the replacement string for this quote. +(defcustom org-e-html-smart-quote-replacements + '((fr laquo;nbsp; nbsp;raquo; lsquo; rsquo; rsquo;) +(en ldquo; rdquo; lsquo; rsquo; rsquo;) +(de bdquo; ldquo; sbquo; lsquo; rsquo;)) + What to export for `smart-quotes'. +A list of five strings: + 1. Open double-quotes + 2. Close double-quotes + 3. Open single-quote + 4. Close single-quote + 5. Mid-word apostrophe :group 'org-export-e-html :type '(list - (cons :tag Opening quote - (string :tag Regexp for char before) - (string :tag Replacement quote )) - (cons :tag Closing quote - (string :tag Regexp for char after ) - (string :tag Replacement quote )) - (cons :tag Single quote - (string :tag Regexp for char before) - (string :tag Replacement quote + (string :tag Open double-quotes); â + (string :tag Close double-quotes) ; â + (string :tag Open single-quote) ; â + (string :tag Close single-quote); â + (string :tag Mid-word apostrophe))) ; â Compilation @@ -1459,15 +1446,7 @@ This is used to choose a separator for constructs like \\verb. Export quotation marks depending on language conventions. TEXT is a string containing quotation marks to be replaced. INFO is a plist used as a communication channel. - (mapc (lambda(l) - (let ((start 0)) - (while (setq start (string-match (car l) text start)) - (let
Re: [O] Smart Quotes Exporting
All right, preliminary patch is attached, *maybe* good enough for more serious consideration now, but might need some fixes. Still only uses ordinary regexps and plain-text strings, but can now handle the example with formatting-breaks next to quotes. Things have been moved into more appropriate locations, made customs, docstrings and types fixed, etc, etc. It supports onscreen display of smart quotes (when enabled); I have the quotes displayed in org-document-info face so they are slightly distinct, to make it clearer that they are altered from what they are in the plain text. This may or may not be a popular (or good) idea. I have also built it into the new export engine in org-e-latex and org-e-html as proofs of concept. I'm not positive the latex one will work properly for German, though; there might need to be something enabled in LaTeX for it to format ,, into „. It should probably be set not to smartify quotes onscreen in comments; I haven't done that yet. Comments welcome; I hope I didn't complicate matters in the export engines too much. ~mark From 1bc507cf69c94d5645436abc6e28e7d96999083e Mon Sep 17 00:00:00 2001 From: Mark Shoulson m...@kli.org Date: Tue, 29 May 2012 23:01:12 -0400 Subject: [PATCH] Add `smart' quotes for onscreen display and for latex and html export * lisp/org.el: Add `smart' quotes: custom variables to define regexps to recognize quotes, to define how and whether to display them, and org-fontify-quotes to display `smart-quote' characters when activated. * contrib/lisp/org-export.el: Add function org-export-quotation-marks as a utility function usable by individual exporters to apply `smart' quotes. * contrib/lisp/org-e-latex.el: Replace org-e-latex-quotes custom with org-e-latex-quotes-replacements and make org-e-latex--quotation-marks use the org-export-quotation-marks function in org-export.el. * contrib/lisp/org-e-html.el: Replace org-e-html-quotes custom with org-e-html-quotes-replacements and enable org-e-html--quotation-marks, using org-export-quotation-marks function in org-export.el. --- contrib/lisp/org-e-html.el | 57 contrib/lisp/org-e-latex.el | 67 ++--- contrib/lisp/org-export.el | 26 +++ lisp/org.el | 101 +++ 4 files changed, 168 insertions(+), 83 deletions(-) diff --git a/contrib/lisp/org-e-html.el b/contrib/lisp/org-e-html.el index 53547a0..d4a505e 100644 --- a/contrib/lisp/org-e-html.el +++ b/contrib/lisp/org-e-html.el @@ -1077,37 +1077,24 @@ in order to mimic default behaviour: Plain text -(defcustom org-e-html-quotes - '((fr - (\\(\\s-\\|[[(]\\|^\\)\ . «~) - (\\(\\S-\\)\ . ~») - (\\(\\s-\\|(\\|^\\)' . ')) -(en - (\\(\\s-\\|[[(]\\|^\\)\ . ``) - (\\(\\S-\\)\ . '') - (\\(\\s-\\|(\\|^\\)' . `))) - Alist for quotes to use when converting english double-quotes. - -The CAR of each item in this alist is the language code. -The CDR of each item in this alist is a list of three CONS: -- the first CONS defines the opening quote; -- the second CONS defines the closing quote; -- the last CONS defines single quotes. - -For each item in a CONS, the first string is a regexp -for allowed characters before/after the quote, the second -string defines the replacement string for this quote. +(defcustom org-e-html-smart-quote-replacements + '((fr laquo;nbsp; nbsp;raquo; lsquo; rsquo; rsquo;) +(en ldquo; rdquo; lsquo; rsquo; rsquo;) +(de bdquo; ldquo; sbquo; lsquo; rsquo;)) + What to export for `smart-quotes'. +A list of five strings: + 1. Open double-quotes + 2. Close double-quotes + 3. Open single-quote + 4. Close single-quote + 5. Mid-word apostrophe :group 'org-export-e-html :type '(list - (cons :tag Opening quote - (string :tag Regexp for char before) - (string :tag Replacement quote )) - (cons :tag Closing quote - (string :tag Regexp for char after ) - (string :tag Replacement quote )) - (cons :tag Single quote - (string :tag Regexp for char before) - (string :tag Replacement quote + (string :tag Open double-quotes); â + (string :tag Close double-quotes) ; â + (string :tag Open single-quote) ; â + (string :tag Close single-quote); â + (string :tag Mid-word apostrophe))) ; â Compilation @@ -1497,15 +1484,7 @@ This is used to choose a separator for constructs like \\verb. Export quotation marks depending on language conventions. TEXT is a string containing quotation marks to be replaced. INFO is a plist used as a communication channel. - (mapc (lambda(l) - (let ((start 0)) - (while (setq start (string-match (car l) text start)) - (let ((new-quote (concat (match-string 1 text) (cdr l - (setq text (replace-match new-quote t t text)) - (cdr (or (assoc (plist-get info :language) org-e-html-quotes) - ;; Falls back on English. - (assoc en
Re: [O] Smart Quotes Exporting
Hello, Mark E. Shoulson m...@kli.org writes: Oh, certainly; they're all a disaster. I think I said that in the writeup at the top. This is just proof of concept, nothing is in the right place, nothing is properly documented. They have to be defcustoms, there needs to be a good :type in the defcustom as well as a proper docstring. You'll get no argument from me about the lack (or inaccuracy) of docstrings and such. I hadn't gotten that far yet. I said the patch was only if you wanted to tinker with the development as this progresses. No worries, I was just making some comments before forgetting about them. +(defun org-e-latex--quotation-marks (text info) + (org-export-quotation-marks text info org-e-latex-quote-replacements)) + ;; (mapc (lambda(l) + ;; (let ((start 0)) + ;;(while (setq start (string-match (car l) text start)) + ;; (let ((new-quote (concat (match-string 1 text) (cdr l + ;;(setq text (replace-match new-quote t t text)) + ;;(cdr (or (assoc (plist-get info :language) org-e-latex-quotes) + ;; ;; Falls back on English. + ;; (assoc en org-e-latex-quotes + ;; text) Use directly `org-e-latex-quote-replacements' in code then. Not sure I understand this comment. Since `org-e-latex--quotation-marks' just calls `org-export-quotation-marks', you can remove completely the former from org-export.el and use the latter instead. So... there's the filter-parse-tree-functions hook gets applied within the parse tree... so a back-end can add a function to that list which looks over the parse-tree and watches for these border cases (and also the ones within ordinary strings). Looks like it's going to be tough to work in any flexibility to define further per-language or per-backend cleverness to handle anything beyond the canonical set of open-double, close-double, open-single, close-single, and mid-word. To be sure, anything we do will most assuredly fail even on some fairly reasonable input, in which case the users are pretty much on their own and will have to do things the hard way. And I could use that as the answer here, that, well, it'll work only within plain-text strings (and I might possibly still have to use that answer), but I would rather include the situations you bring up in the supported set and not throw up my hands at it. So, yes, will look at that. Actually it isn't very hard to handle this problem. But it will be different than the fontification used in an Org buffer. You might want to look at `org-element-normalize-contents', which solves a similar problem: removing maximum common indentation at the parsed paragraph level. As a first approximation, I can imagine a function accepting an element, an object or a secondary string and returning an equivalent element, object or secondary string, with its quotes smartified. The algorithm could go like this: Walk element/object/secondary-string's contents . 1. When a string is encountered: 1. If it has a quote as its first or last position, check for objects before or after the string to guess its status. An object never starts with a white space, but you may have to check :post-blank property in order to know if previous object had white spaces at its end. 2. For each quote everywhere else in the string, your regexp can handle it fine. 2. When an object belonging to `org-element-recursive-objects' is encountered, apply the function to this object. 3. Accumulate returned strings or objects. Use accumulated data as the contents of the new object to return (i.e. just add the type and the same properties at the beginning of this list if it was an object or an element, return it as-is if that was a secondary string). On the elements side, only paragraphs, verse-blocks and table-rows can directly contain quotes. Also, headline, inlinetask item and footnote-reference have secondary strings containing quotes. I'm not sure yet where and how to install such a function, but I will think about it when it is implemented. Regards, -- Nicolas Goaziou
Re: [O] Smart Quotes Exporting
On 06/01/2012 01:11 PM, Nicolas Goaziou wrote: Hello, Mark E. Shoulsonm...@kli.org writes: Oh, certainly; they're all a disaster. I think I said that in the writeup at the top. This is just proof of concept, nothing is in the right place, nothing is properly documented. They have to be defcustoms, there needs to be a good :type in the defcustom as well as a proper docstring. You'll get no argument from me about the lack (or inaccuracy) of docstrings and such. I hadn't gotten that far yet. I said the patch was only if you wanted to tinker with the development as this progresses. No worries, I was just making some comments before forgetting about them. Ah, ok. Good! Thanks. +(defun org-e-latex--quotation-marks (text info) + (org-export-quotation-marks text info org-e-latex-quote-replacements)) + ;; (mapc (lambda(l) + ;; (let ((start 0)) + ;; (while (setq start (string-match (car l) text start)) + ;; (let ((new-quote (concat (match-string 1 text) (cdr l + ;; (setq text (replace-match new-quote t t text)) + ;; (cdr (or (assoc (plist-get info :language) org-e-latex-quotes) + ;;;; Falls back on English. + ;;(assoc en org-e-latex-quotes + ;; text) Use directly `org-e-latex-quote-replacements' in code then. Not sure I understand this comment. Since `org-e-latex--quotation-marks' just calls `org-export-quotation-marks', you can remove completely the former from org-export.el and use the latter instead. Well, that was done on purpose, and maybe the reason will make sense. As I see it, each exporter should be able to have its own smartifier function, and the export engine should make no assumptions about that: just call the individual exporter's function. On the other hand, many (but perhaps not all!) of the exporters may find themselves using essentially the same code just with different replacement strings. So I thought that general-purpose should be in org-export.el, just for the convenience of exporters should they choose to make use of it. So, many of the exporters' smartifier functions will really just be calls to the more general-purpose function. Does that make sense? So... there's the filter-parse-tree-functions hook gets applied within the parse tree... so a back-end can add a function to that list which looks over the parse-tree and watches for these border cases (and also the ones within ordinary strings). Looks like it's going to be tough to work in any flexibility to define further per-language or per-backend cleverness to handle anything beyond the canonical set of open-double, close-double, open-single, close-single, and mid-word. To be sure, anything we do will most assuredly fail even on some fairly reasonable input, in which case the users are pretty much on their own and will have to do things the hard way. And I could use that as the answer here, that, well, it'll work only within plain-text strings (and I might possibly still have to use that answer), but I would rather include the situations you bring up in the supported set and not throw up my hands at it. So, yes, will look at that. Actually it isn't very hard to handle this problem. But it will be different than the fontification used in an Org buffer. Yes, the fontification on-screen is different, and uses a rather different function--but if I can help it, the same regexps! So things work the same everywhere. I also started thinking a little about what you write below, how we can inspect the characters just after or before quotes at the very beginning or end of each chunk. It would be nice if it could all be encapsulated neatly in the regexp(s). As a first approximation, I can imagine a function accepting an element, an object or a secondary string and returning an equivalent element, object or secondary string, with its quotes smartified. The algorithm could go like this: Walk element/object/secondary-string's contents . Need it be element/object/secondary-string? At the bottom level it's always about strings; the higher levels don't affect the processing of each string in isolation. Do we need to intercept it at the element level or just wait to grab things in the plain-text filter, since we have access at that point too? (Might also be that my understanding of the process and the nature of elements is faulty or limited. Will have to see what works.) 1. When a string is encountered: 1. If it has a quote as its first or last position, check for objects before or after the string to guess its status. An object never starts with a white space, but you may have to check :post-blank property in order to know if previous object had white spaces at its end. Hmm, this may in fact answer my question above: you need to be able to get at the object level to test the post-blank. I'll experiment. 2. For each quote everywhere else in the string,
Re: [O] Smart quotes
Hello, Mark E. Shoulson m...@kli.org writes: Maybe, if it's all on one line. But if the quote is several lines long, can you sensibly count the levels? Well, yes. I guess it doesn't actually matter, but it starts to get weird if you find yourself looking arbitrarily far back, and then you start building in exceptions for crossing paragraph boundaries... True. I had the exporter in mind, where you always start at the beginning of the paragraph. It would be more difficult with search starting in the middle of the paragraph. And then there's the fact that multi-paragraph quotes usually have an open-quote for each paragraph but only one close-quote at the end... Some french typographers suggest to use a close-quote at the beginning of the paragraph to avoid that confusion, or to simply drop them (since they are a pain to maintain anyway). I don't know about other languages but, if that's the same, is it a good idea to bother implementing it? Actually keeping count of what level you're at, accurately, is a classic example of a non-regular language; you need a push-down automaton to keep count, and regular expressions don't cut it. This is limited to 2 levels. I'm rambling. In sum, I'm going to start off /not/ trying to solve that problem, and assume the writer is going to use alternating and as typography requires and not try to second-guess what level we're at. You are right, the problem will be easier to solve with both and '. Though, as typography requires is not true. In France, the /Imprimerie Nationale/ suggests to use guillemots at both levels. Remember that typography is localized, which is the main difficulty of the implementation. Regards, -- Nicolas Goaziou
Re: [O] Smart quotes
On 05/29/2012 01:57 PM, Nicolas Goaziou wrote: Hello, Mark E. Shoulsonm...@kli.org writes: I guess it doesn't actually matter, but it starts to get weird if you find yourself looking arbitrarily far back, and then you start building in exceptions for crossing paragraph boundaries... True. I had the exporter in mind, where you always start at the beginning of the paragraph. It would be more difficult with search starting in the middle of the paragraph. Maybe the on-screen stuff is no harder; will just have to see. And then there's the fact that multi-paragraph quotes usually have an open-quote for each paragraph but only one close-quote at the end... Some french typographers suggest to use a close-quote at the beginning of the paragraph to avoid that confusion, or to simply drop them (since they are a pain to maintain anyway). I don't know about other languages but, if that's the same, is it a good idea to bother implementing it? I've never heard of it. But I think we may be overthinking this; we can drive ourselves crazy trying to compress a dozen different typographical traditions (and informal customs) into a few Elisp rules. On the other hand, I don't think we need to throw up our hands and give up either! :) Actually keeping count of what level you're at, accurately, is a classic example of a non-regular language; you need a push-down automaton to keep count, and regular expressions don't cut it. This is limited to 2 levels. True. I'm rambling. In sum, I'm going to start off /not/ trying to solve that problem, and assume the writer is going to use alternating and as typography requires and not try to second-guess what level we're at. You are right, the problem will be easier to solve with both and '. Though, as typography requires is not true. In France, the /Imprimerie Nationale/ suggests to use guillemots at both levels. Remember that typography is localized, which is the main difficulty of the implementation. Also a good point. All right, bottom line, this is sort of what I'm seeing. I'm not 100% sure which files should house these things, but something like this: 1) a variable containing for each language regexp for each of: open double-quote, close double-quote, open single-quote, close single-quote, and maybe mid-word apostrophe. Odds are these regexps are going to be the same for just about all languages (the regexps detecting them, mind you), so probably should have some sort of default that the alist can just reference. A language should also be allowed to define other quote regexps in its list too. We need these to be ordered, with a standard set, so that we can have... 2) for each *exporter* (including on-screen display), a variable that defines, for each language, what the *substitution* will be for open-double-quote, close-double-quote, etc. Other extras can be defined too. That way we can have an exporter-independent way to detect quotes to be smartified, but each exporter has its own way to smartify them. 3) Since most exporters are probably going to be handling doing the process approximately the same (match the regexp, stick in the associated substitution), org-export.el should have a generic function that does this which each exporter *may* call in (or as) its quote-smartifier in its text translator, unless it needs something more specific which it can provide itself. In terms of what is handled, the idea in my head is that we would expect the writer to be using or ' to surround their quotes, regardless of what their native custom is (if they're doing it using their language-specific quote-marks, we don't need to bother with all this anyway). Goal is to handle either quotes or 'quotes' in either nesting (or no nesting, if someone does quote' for some reason), and with any luck not get too confused with other uses of apostrophe. It makes sense to me, but I bet I explained it badly and people are going to have all kinds of issues with it. :) No telling when (if?) I'll be able to produce something along these lines, but it's something to start thinking about anyway. ~mark
Re: [O] Smart quotes
On 05/26/2012 02:48 AM, Nicolas Goaziou wrote: Hello, Mark E. Shoulsonm...@kli.org writes: The regexp may be able to tell level 1 from level 2 quotes. Do you mean that the author would use the same characters for both first and second level quotes, and the regexp would be smart enough to distinguish which level each was at? I don't think that's possible, and you probably don't either. Actually, I do. Since you can tell an opening quote from a closing one by the position of the white space (or parenthesis, beginning/end of line) near it, I think you can deduce the quote level. I may be wrong, though. Maybe, if it's all on one line. But if the quote is several lines long, can you sensibly count the levels? I guess it doesn't actually matter, but it starts to get weird if you find yourself looking arbitrarily far back, and then you start building in exceptions for crossing paragraph boundaries... And then there's the fact that multi-paragraph quotes usually have an open-quote for each paragraph but only one close-quote at the end... Actually keeping count of what level you're at, accurately, is a classic example of a non-regular language; you need a push-down automaton to keep count, and regular expressions don't cut it. Then again, Emacs regexps are more powerful than simple regular expressions, and we only would want to keep track of even vs odd level anyway. I'm rambling. In sum, I'm going to start off /not/ trying to solve that problem, and assume the writer is going to use alternating and ' as typography requires and not try to second-guess what level we're at. As that progresses, maybe I'll come to understand better what can and can't (and should and shouldn't) be deduced by the regexps. this is a 'quote', and that's all you need to know. becoming, for instance «this is a ‹quote›, and that’s all you need to know.» this is a quote, and that's all you need to know is as parsable to me. As a side note, at least in French, many typographers would recommend this is a /quote/, and that's all you need to know here. Oh, and I know that was just an example. I see; because I can tell that the second must be an open-quote and not closing the first, due to its position relative to the spaces. It does seem possible, but I think I'm going to try not solving that problem first. (And French typography raises other problems, since French puts lots of space around the quote-marks, to the extent that French typists typing plain-text will often put a space on both sides of a quote-mark, making it hard to see whether it opens or closes... another issue, not necessarily solvable, to watch for.) ~mark
Re: [O] Smart quotes
Hello, Mark E. Shoulson m...@kli.org writes: The regexp may be able to tell level 1 from level 2 quotes. Do you mean that the author would use the same characters for both first and second level quotes, and the regexp would be smart enough to distinguish which level each was at? I don't think that's possible, and you probably don't either. Actually, I do. Since you can tell an opening quote from a closing one by the position of the white space (or parenthesis, beginning/end of line) near it, I think you can deduce the quote level. I may be wrong, though. this is a 'quote', and that's all you need to know. becoming, for instance «this is a ‹quote›, and that’s all you need to know.» this is a quote, and that's all you need to know is as parsable to me. As a side note, at least in French, many typographers would recommend this is a /quote/, and that's all you need to know here. Oh, and I know that was just an example. I'd love to get org more export-friendly. I'll see what I can understand of the (new) export code. Do not hesitate to ask questions about it. Regards, -- Nicolas Goaziou
Re: [O] Smart quotes
Hello, Mark E. Shoulson m...@kli.org writes: Hm. I like the idea, but it raises some questions for me. It would be particularly good if this could share code/custom variables with the pieces of the (new) exporter that make smart quotes on export. That way we could be sure that what it looks like onscreen would also be what it looked like when exported. I could be interesting, but keep in mind that no matter how smart your quotes are, they will fail in some situations. So, it will have to be optional for export, independently on their in-buffer status. The OPTIONS keyword may be used, with q:t and q:nil items. Looking at contrib/lisp/org-e-latex.el at an upcoming exporter for such things, I see a variable org-e-latex-quotes, which has nice language-aware parts... but misses an important point. Each language gets to define one regexp for opening quotes, one for closing quotes, and one for single quotes. But don't we want to talk about (at least) two levels of quotes, see your own reference[fn:1]? Probably. But that's going to be somewhat harder. Single quotes would be for inner, second-level quotes (if we're using double straight quotes according to (American) English usage, I would guess we'd be using single straight quotes the same way). That works okay for English, where a single apostrophe not part of a grouping construct is going to be interpreted as a close single quote and look right for an apostrophe. The regexp may be able to tell level 1 from level 2 quotes. It might not work so good in French where apostrophes are also used, There are no spaces around apostrophes, so they shouldn't be caught by the regexp. but also single guillemets for inner-level quotes. What are single guillemets? I don't think there is such thing in French. Should/can we consider extending this for the new exporters? I think it would be a good addition to the export mechanism, if you want to give it a try. (I'm looking forward to HTML and ODT exporters that can do smart quotes; the straight quotes are really the main jarring things about using Org as a lightweight markup and exporting into something fancier) A function, provided in org-export, could help changing dumb quotes into smart quotes in plain text. Then, it would be easier for back-ends to provide the feature, if they wanted to. Regards, -- Nicolas Goaziou
Re: [O] Smart quotes
I could be interesting, but keep in mind that no matter how smart your quotes are, they will fail in some situations. So, it will have to be optional for export, independently on their in-buffer status. The OPTIONS keyword may be used, with q:t and q:nil items. I don't see an entry for this in `org-export-options-alist'. So I believe you are soliciting opinion on a fresh addition. (I'm looking forward to HTML and ODT exporters that can do smart quotes; the straight quotes are really the main jarring things about using Org as a lightweight markup and exporting into something fancier) A function, provided in org-export, could help changing dumb quotes into smart quotes in plain text. Then, it would be easier for back-ends to provide the feature, if they wanted to. I can use it, if made available. I think, It will be help if we force all exporters to produce utf-8 files. --
Re: [O] Smart quotes
On 05/25/2012 01:14 PM, Nicolas Goaziou wrote: Hello, Mark E. Shoulsonm...@kli.org writes: Hm. I like the idea, but it raises some questions for me. It would be particularly good if this could share code/custom variables with the pieces of the (new) exporter that make smart quotes on export. That way we could be sure that what it looks like onscreen would also be what it looked like when exported. I could be interesting, but keep in mind that no matter how smart your quotes are, they will fail in some situations. So, it will have to be optional for export, independently on their in-buffer status. The OPTIONS keyword may be used, with q:t and q:nil items. Smart quotes absolutely have to be optional, and probably disabled by default. They're going to fail sometimes, so they should only be there when you ask for them. Smart-quotes-for-export and smart-quotes-onscreen need to be settable independently, yes. Smart-quotes-for-export needs to be settable per-file/per-buffer, with OPTIONS or something. Smart-quotes-onscreen doesn't have to be buffer-local, though it might be a good idea. Using q:t or maybe :t in options seems perfectly good for setting exporting smart quotes. It still would be good if onscreen and export could share code. Looking at contrib/lisp/org-e-latex.el at an upcoming exporter for such things, I see a variable org-e-latex-quotes, which has nice language-aware parts... but misses an important point. Each language gets to define one regexp for opening quotes, one for closing quotes, and one for single quotes. But don't we want to talk about (at least) two levels of quotes, see your own reference[fn:1]? Probably. But that's going to be somewhat harder. Single quotes would be for inner, second-level quotes (if we're using double straight quotes according to (American) English usage, I would guess we'd be using single straight quotes the same way). That works okay for English, where a single apostrophe not part of a grouping construct is going to be interpreted as a close single quote and look right for an apostrophe. The regexp may be able to tell level 1 from level 2 quotes. Do you mean that the author would use the same characters for both first and second level quotes, and the regexp would be smart enough to distinguish which level each was at? I don't think that's possible, and you probably don't either. What I meant, and you probably did as well, was that if we use apostrophes for second-level quotes, a regexp can be smart enough to tell the difference between a second-level quote and a non-quote apostrophe It might not work so good in French where apostrophes are also used, There are no spaces around apostrophes, so they shouldn't be caught by the regexp. which is what you say here. They *should* be caught by a regexp, but not the same one; they need to be smartified also, just not necessarily treated the same as second-level quotes. but also single guillemets for inner-level quotes. What are single guillemets? I don't think there is such thing in French. You're right; the Wikipedia page says that French uses quote-marks or the same double-chevrons for inner quotes. I thought it used \lsaquo and \rsaquo, « like ‹ this › ». Looks like it does in Swiss typography for various languages, according to the page. Danish also uses the single-chevrons (pointing the other direction), and Azerbaijani and Basque, etc... Whatever. What I meant was, if people are going to be writing using straight ascii quotes and expect them to be changed into language-appropriate quotes, they're going to want something like this is a 'quote', and that's all you need to know. becoming, for instance «this is a ‹quote›, and that’s all you need to know.» that is, it should be possible to use the single quotes for inner quotes, which would mean more than just opening/closing/single in the org-e-latex-quotes (and analogous variables in other exporters). Being able to determine when you need ‹› and when ’ might be a little uncertain, but it isn't hard to make a regexp that can make a decent guess at it. Should/can we consider extending this for the new exporters? I think it would be a good addition to the export mechanism, if you want to give it a try. I'd love to get org more export-friendly. I'll see what I can understand of the (new) export code. (I'm looking forward to HTML and ODT exporters that can do smart quotes; the straight quotes are really the main jarring things about using Org as a lightweight markup and exporting into something fancier) A function, provided in org-export, could help changing dumb quotes into smart quotes in plain text. Then, it would be easier for back-ends to provide the feature, if they wanted to. That sounds like a possibility, might make for good generic handling, only one bit of code to treat everything consistently... yeah, I didn't like the idea at first, I'm starting to like it more.
Re: [O] Smart quotes
Hello, Mark E. Shoulson m...@kli.org writes: Smart quotes can be annoying when they aren't smart enough. But when they work you can miss them. I'm attaching a patch that defines a custom variable org-smart-quotes (nil by default), which when non-nil causes the and ' characters to display as “smart” quotes, hopefully the right ones. They're still ' and in the underlying text, just overlaid with “”. This is not related to entities, so code shouldn't be in org-entities.el. Also, quotes are dependent on locale[fn:1]. English/US only quotes look like a niche to me. Would it be possible to modify the patch and have this feature handle LANGUAGE keyword, or at least have a support for it? Regards, [fn:1] https://en.wikipedia.org/wiki/Non-English_usage_of_quotation_marks -- Nicolas Goaziou
Re: [O] Smart quotes
On 05/23/2012 06:17 PM, Nicolas Goaziou wrote: Hello, Mark E. Shoulsonm...@kli.org writes: Smart quotes can be annoying when they aren't smart enough. But when they work you can miss them. I'm attaching a patch that defines a custom variable org-smart-quotes (nil by default), which when non-nil causes the and ' characters to display as “smart” quotes, hopefully the right ones. They're still ' and in the underlying text, just overlaid with “”. This is not related to entities, so code shouldn't be in org-entities.el. Agreed. Also, quotes are dependent on locale[fn:1]. English/US only quotes look like a niche to me. Would it be possible to modify the patch and have this feature handle LANGUAGE keyword, or at least have a support for it? Hm. I like the idea, but it raises some questions for me. It would be particularly good if this could share code/custom variables with the pieces of the (new) exporter that make smart quotes on export. That way we could be sure that what it looks like onscreen would also be what it looked like when exported. Looking at contrib/lisp/org-e-latex.el at an upcoming exporter for such things, I see a variable org-e-latex-quotes, which has nice language-aware parts... but misses an important point. Each language gets to define one regexp for opening quotes, one for closing quotes, and one for single quotes. But don't we want to talk about (at least) two levels of quotes, see your own reference[fn:1]? Single quotes would be for inner, second-level quotes (if we're using double straight quotes according to (American) English usage, I would guess we'd be using single straight quotes the same way). That works okay for English, where a single apostrophe not part of a grouping construct is going to be interpreted as a close single quote and look right for an apostrophe. It might not work so good in French where apostrophes are also used, but also single guillemets for inner-level quotes. Does the setup there need to be smarter, or at least more extensible, to allow for more than exactly three entries? Clever enough regexps could distinguish inner quotes from apostrophes, etc. Should/can we consider extending this for the new exporters? (I'm looking forward to HTML and ODT exporters that can do smart quotes; the straight quotes are really the main jarring things about using Org as a lightweight markup and exporting into something fancier) ~mark