Re: Discrepancy between documentation and implementation regarding comments

2019-10-29 Thread Robert Pluim
> On Tue, 29 Oct 2019 15:14:37 +0100, Thibault Polge  said:

Thibault> Robert Pluim writes:
>> end of line *is* a whitespace character, but Iʼm not going to argue
>> that. Iʼm going to argue that this doesnʼt cover the case of a '#' at
>> EOB without a newline, hence saying 'zero or more' would be better.

Thibault> But zero-or-more would mean that this line:

Thibault> #Alpha

Thatʼs the problem with human language, itʼs imprecise. I meant

^[ \t]*#[ \t]*$

Robert



Re: Discrepancy between documentation and implementation regarding comments

2019-10-29 Thread Thibault Polge
Robert Pluim writes:

> end of line *is* a whitespace character, but Iʼm not going to argue
> that. Iʼm going to argue that this doesnʼt cover the case of a '#' at
> EOB without a newline, hence saying 'zero or more' would be better.

But zero-or-more would mean that this line:

#Alpha

Is a comment, along with:

#+TITLE: My Org document

And virtually of all Org meta-lines. I've thought about the \n#
issue, but I haven't tested how the current implementation behaves in
this regard.  I think the recent changes in Pandoc would parse it as a
comment.

Regards,
Thibault


signature.asc
Description: PGP signature


Re: Discrepancy between documentation and implementation regarding comments

2019-10-29 Thread Robert Pluim
> On Mon, 28 Oct 2019 17:16:55 +0100, Nicolas Goaziou 
>  said:

Nicolas> Hello,
Nicolas> Thibault Polge  writes:

>> Thanks Nicolas, just a small detail though: unless this is a planned
>> (breaking) change, I believe the description you linked should read:
>> 
>> A “comment line” starts with *zero or more whitespace characters,
>> followed by* a hash sign, followed by a whitespace character or an end
>> of line.

Nicolas> True. I fixed that.

end of line *is* a whitespace character, but Iʼm not going to argue
that. Iʼm going to argue that this doesnʼt cover the case of a '#' at
EOB without a newline, hence saying 'zero or more' would be better.

(and if it really is *one* whitespace character, thatʼs a breaking
change from at least org-9.2.6, which allows zero-or-more).

Robert



Re: Discrepancy between documentation and implementation regarding comments

2019-10-28 Thread Nicolas Goaziou
Hello,

Thibault Polge  writes:

> Thanks Nicolas, just a small detail though: unless this is a planned
> (breaking) change, I believe the description you linked should read:
>
> A “comment line” starts with *zero or more whitespace characters,
> followed by* a hash sign, followed by a whitespace character or an end
> of line.

True. I fixed that.

> Another detail: it could be nice to have a small appendix somewhere
> mapping character names to codepoints, since Unicode has no less than
> three “number signs” (from Wikipedia):
>
>  - U+0023 # NUMBER SIGN (HTML ). Other attested names in Unicode are: 
> pound sign, hash, crosshatch, octothorpe.
>  - U+FF03 # FULLWIDTH NUMBER SIGN (HTML )
>  - U+FE5F ﹟ SMALL NUMBER SIGN (HTML )

This is left as an exercise to the reader. ;)

Regards,

-- 
Nicolas Goaziou



Re: Discrepancy between documentation and implementation regarding comments

2019-10-28 Thread Thibault Polge
Nicolas Goaziou writes:
> See  (with a nice
> typo...)

Thanks Nicolas, just a small detail though: unless this is a planned
(breaking) change, I believe the description you linked should read:

A “comment line” starts with *zero or more whitespace characters,
followed by* a hash sign, followed by a whitespace character or an end
of line.

Another detail: it could be nice to have a small appendix somewhere
mapping character names to codepoints, since Unicode has no less than
three “number signs” (from Wikipedia):

 - U+0023 # NUMBER SIGN (HTML ). Other attested names in Unicode are: 
pound sign, hash, crosshatch, octothorpe.
 - U+FF03 # FULLWIDTH NUMBER SIGN (HTML )
 - U+FE5F ﹟ SMALL NUMBER SIGN (HTML )

Regards,
Thibault


signature.asc
Description: PGP signature


Re: Discrepancy between documentation and implementation regarding comments

2019-10-28 Thread Nicolas Goaziou
Hello,

Thibault Polge  writes:

> According to Org-Mode documentation[1],

See  (with a nice
typo...)

Regards,

-- 
Nicolas Goaziou



Re: Discrepancy between documentation and implementation regarding comments

2019-10-27 Thread Samuel Wales
beware # at eob with no newline.

On 10/27/19, Adam Porter  wrote:
> I agree with Robert that "whitespace" includes newlines in "Emacsland."
> For example, with this document (the second "#" has a newline
> immediately after, no spaces or tabs):
>
> #+BEGIN_SRC org
> foo
>
> # comment
>
> bar
>
> #
>
> buzz
> #+END_SRC
>
> This code matches both lines that begin with "#":
>
>   (re-search-forward (rx bol "#" (1+ space)))
>
> But this code only matches the first one, because "blank" only matches
> "horizontal whitespace":
>
>   (re-search-forward (rx bol "#" (1+ blank)))
>
> So I think Pandoc is technically at fault here.  However, outside of
> Emacs's own context, I can see how the the documentation could be
> misinterpreted in this case, so it's hard to fault them too much.  :)
>
>
>


-- 
The Kafka Pandemic

What is misopathy?
https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html

The disease DOES progress. MANY people have died from it. And ANYBODY
can get it at any time.



Re: Discrepancy between documentation and implementation regarding comments

2019-10-27 Thread Adam Porter
I agree with Robert that "whitespace" includes newlines in "Emacsland."
For example, with this document (the second "#" has a newline
immediately after, no spaces or tabs):

#+BEGIN_SRC org
foo

# comment

bar

#

buzz
#+END_SRC

This code matches both lines that begin with "#":

  (re-search-forward (rx bol "#" (1+ space)))

But this code only matches the first one, because "blank" only matches
"horizontal whitespace":

  (re-search-forward (rx bol "#" (1+ blank)))

So I think Pandoc is technically at fault here.  However, outside of
Emacs's own context, I can see how the the documentation could be
misinterpreted in this case, so it's hard to fault them too much.  :)




Re: Discrepancy between documentation and implementation regarding comments

2019-10-27 Thread Robert Pluim
> On Sun, 27 Oct 2019 11:07:20 +0100, Thibault Polge  said:

Thibault> Hello,
Thibault> According to Org-Mode documentation[1],

>> Lines starting with zero or more whitespace characters followed by one
>> ‘#’ and a whitespace are treated as comments and, as such, are not
>> exported.

'whitespace' in emacs normally covers newline as well. Of course org
might mean 'at least one space or tab', but as you say, thatʼs not
what the implementation does. eg in org 9.2.6, org-fill-element does

(re-search-backward "^[ \t]*#[ \t]*$" begin t)

However org-at-comment-p does

(looking-at "^[ \t]*# ")

so thereʼs some possible inconsistency there.

FWIW, Iʼd vote for expressing it as 'zero or more whitespace followed
by one # followed by zero or more whitespace'

Robert



Discrepancy between documentation and implementation regarding comments

2019-10-27 Thread Thibault Polge
Hello,

According to Org-Mode documentation[1],

> Lines starting with zero or more whitespace characters followed by one
> ‘#’ and a whitespace are treated as comments and, as such, are not
> exported.

The actual implementation differs on a subtle detail: Org-Mode will
treat a line where the pound sign is immediatly followed by \n as a
comment.  I believe this is expected behavior, since it allows to
comment out multiple paragraphs, and behaves as expected even when using
`delete-trailing-whitespace`.

I'm asking this because Pandoc follows strictly the org documentation,
and treats a line containing only a pound sign as text.  I opened a bug
about this[2], where I've been asked –reasonably– to first make sure the
bug isn't actually in Org Mode.

[1] https://orgmode.org/manual/Comment-lines.html

[2] https://github.com/jgm/pandoc/issues/5856

All the best,

--
Thibault


signature.asc
Description: PGP signature