Re: [O] [BUG] Mark-up handling chokes on unicode whitespace

2014-09-23 Thread Aaron Ecay
Hi Tobias,

2014ko irailak 23an, Tobias Getzner-ek idatzi zuen:
> 
> Hello Aaron!
> 
> On Tue, 23 Sep 2014 13:03:06 -0400, Aaron Ecay wrote:
> 
>> You will need to change the variable org-emphasis-regexp-components; see
>> the documentation thereof.
> 
> Thank you very much! This seems to do it.
> 
> Might I suggest amending unicode whitespace to the default? That variable 
> seems a bit opaque and I might probably never have discovered it on my 
> own; it also appears as if one has to ensure that this is set before org-
> mode is «required», and one cannot easily just extend the default without 
> also setting the rest. For type-setting purposes, at least the class of 
> non-breaking whitespace is very useful.

org-emphasis-regexp-components is known to be a wart.  You can search
for posts on the mailing list.  Some people are trying to figure out how
to get rid of it.  (You can search in particular for Nicolas Goaziou’s
posts...)  Here’s one thread where you can see the lay of the land:
.

All that to say, the longer-term solution is to figure out some radically
different approach.  In the meantime though, if you can provide a list of
characters (by unicode name and/or code point) that you think should be
added to that variable, someone might be able to add them.  (I probably
would not make such a change on my own, but would wait for feedback from
Nicolas, Bastien, or one of the other maintainer-esque figures on the
list).  On the other hand, they might say “making such a change in org’s
core is just restacking the deck chairs on the Titanic,” which would
also be a reasonable position for them to take IMO.

> 
> At first I thought it might be easy to cleanly solve such problems by 
> using the whitespace character class throughout, but to my chagrin it 
> seems that at least «search-forward-regexp» will only match 8-bit 
> whitespace this way, so I suppose Emacs regex isn’t aware of non-ASCII 
> whitespace? :'|

I don’t really know anything about this...it’s unfortunate if true
though.

-- 
Aaron Ecay



Re: [O] [BUG] Mark-up handling chokes on unicode whitespace

2014-09-23 Thread Tobias Getzner
Hello Aaron!

On Tue, 23 Sep 2014 13:03:06 -0400, Aaron Ecay wrote:

> 2014ko irailak 23an, Tobias Getzner-ek idatzi zuen:
>> 
>> When mark-up such as =monospace=, /italic/, etc. is preceded by a
>> non-8bit whitespace, e. g., «narrow no-break space» (U+202F) or
>> «no-break space» (U+00A0), org-mode will not recognize the mark-up
>> content correctly
> 
> You will need to change the variable org-emphasis-regexp-components; see
> the documentation thereof.

Thank you very much! This seems to do it.

Might I suggest amending unicode whitespace to the default? That variable 
seems a bit opaque and I might probably never have discovered it on my 
own; it also appears as if one has to ensure that this is set before org-
mode is «required», and one cannot easily just extend the default without 
also setting the rest. For type-setting purposes, at least the class of 
non-breaking whitespace is very useful.

At first I thought it might be easy to cleanly solve such problems by 
using the whitespace character class throughout, but to my chagrin it 
seems that at least «search-forward-regexp» will only match 8-bit 
whitespace this way, so I suppose Emacs regex isn’t aware of non-ASCII 
whitespace? :'| 

Best,
Tobias




Re: [O] [BUG] Mark-up handling chokes on unicode whitespace

2014-09-23 Thread Aaron Ecay
Hi Tobias,

2014ko irailak 23an, Tobias Getzner-ek idatzi zuen:
> 
> When mark-up such as =monospace=, /italic/, etc. is preceded by a 
> non-8bit whitespace, e. g., «narrow no-break space» (U+202F) or «no-break 
> space» (U+00A0), org-mode will not recognize the mark-up content 
> correctly; i. e., this content will fail to be syntax-highlighted, and 
> the mark-up syntax will be exported in verbatim by the exporter.

You will need to change the variable org-emphasis-regexp-components; see
the documentation thereof.

-- 
Aaron Ecay



[O] [BUG] Mark-up handling chokes on unicode whitespace

2014-09-23 Thread Tobias Getzner
When mark-up such as =monospace=, /italic/, etc. is preceded by a 
non-8bit whitespace, e. g., «narrow no-break space» (U+202F) or «no-break 
space» (U+00A0), org-mode will not recognize the mark-up content 
correctly; i. e., this content will fail to be syntax-highlighted, and 
the mark-up syntax will be exported in verbatim by the exporter.

Best regards,
Tobias