Re: Extra paragraphs incorrectly spawning when ":end:" appears.

2024-02-13 Thread Ihor Radchenko
Ihor Radchenko  writes:

> "Tom Alexander"  writes:
>
>> This test document should have 1 paragraph but org-mode is parsing it as 2:
>> ```
>> foo
>> :end:
>> baz
>> ```
>>
>> which parses as:
>> ```
>> (section
>>   (paragraph "foo\n")
>>   (paragraph ":end:\nbaz\n")
>> )
>> ```
> 
> The documentation is not accurate here.
>
> The parser uses anything that _potentially_ looks like the beginning of
> another element to calculate paragraph boundaries
> (`org-element-paragraph-separate'). ":end:" is potentially a drawer and
> thus ends the preceding paragraph.

I was wrong.
`org-element-paragraph-parser' actually does perform forward-checking.
So, your example is a genuine bug in the parser. (and the relevant tests
were not very accurate due to copy-pasting)
Fixed, on main.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=61c235b77

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at .
Support Org development at ,
or support my work at 



Re: Extra paragraphs incorrectly spawning when ":end:" appears.

2023-10-02 Thread Tom Alexander
Hmm thanks, that makes sense. I guess a post-processing step to merge adjacent 
paragraphs wouldn't work either since that wouldn't stitch together objects 
like the bold in this test document without re-parsing the entire paragraph:
```
foo *bar
:end:
baz*
```

oh well 路

--
Tom Alexander
pgp: https://fizz.buzz/pgp.asc



Re: Extra paragraphs incorrectly spawning when ":end:" appears.

2023-10-01 Thread Ihor Radchenko
"Tom Alexander"  writes:

> This test document should have 1 paragraph but org-mode is parsing it as 2:
> ```
> foo
> :end:
> baz
> ```
>
> which parses as:
> ```
> (section
>   (paragraph "foo\n")
>   (paragraph ":end:\nbaz\n")
> )
> ```
>
> The paragraph documentation[1] states that:
>> Empty lines and other elements end paragraphs.
>
> But the document contains no empty lines and we can see in the output that it 
> only contains paragraphs.

The documentation is not accurate here.

The parser uses anything that _potentially_ looks like the beginning of
another element to calculate paragraph boundaries
(`org-element-paragraph-separate'). ":end:" is potentially a drawer and
thus ends the preceding paragraph.

Later, ":end:" line is parsed as a new structural element using
`org-element-drawer-parser'. The drawer parser detects that there is no
closing :end: line and thus falls back to paragraph parsing:

(defun org-element-drawer-parser (limit affiliated)
...
;; Incomplete drawer: parse it as a paragraph.
(org-element-paragraph-parser limit affiliated)

The same logic applies to a number of other incomplete elements.

The reason behind the current logic and not re-parsing the preceding
paragraph when we encounter incomplete drawer/block/etc is that Org
parser is written to do a single pass - we never re-parse already parsed
parts. Doing things otherwise, while could solve certain non-intuitive
behaviors, would be problematic performance-wise.

So, the actual paragraph separator that should be used is
`org-element-paragraph-separate' regexp.

We need to fix the WORG syntax description accordingly.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at .
Support Org development at ,
or support my work at 



Re: Extra paragraphs incorrectly spawning when ":end:" appears.

2023-09-30 Thread Tom Alexander
Same problem occurs with this sample document:
```
foo
#+BEGIN: bar
baz
```

which parses as:
```
(section
  (paragraph "foo\n")
  (paragraph "#+BEGIN: bar\nbaz\n)
)
```

again, no blank lines and no non-paragraph elements but the single paragraph 
got split in two.

--
Tom Alexander
pgp: https://fizz.buzz/pgp.asc



Extra paragraphs incorrectly spawning when ":end:" appears.

2023-09-30 Thread Tom Alexander
This test document has 1 paragraph:
```
foo
bar
baz
```
which parses as:
```
(section
  (paragraph "foo\nbar\nbaz\n")
)
```

This test document should have 1 paragraph but org-mode is parsing it as 2:
```
foo
:end:
baz
```

which parses as:
```
(section
  (paragraph "foo\n")
  (paragraph ":end:\nbaz\n")
)
```

The paragraph documentation[1] states that:
> Empty lines and other elements end paragraphs.

But the document contains no empty lines and we can see in the output that it 
only contains paragraphs.

[1] https://orgmode.org/worg/org-syntax.html#Paragraphs

--
Tom Alexander
pgp: https://fizz.buzz/pgp.asc