Re: Org Syntax Specification

2022-11-25 Thread Bastien
Ihor Radchenko  writes:

> I think we need to use rewrite rule here in addition to moving the file.
> If we simply move the file, old links will be broken.

Done now, thanks.

-- 
 Bastien



Re: Org Syntax Specification

2022-11-25 Thread Ihor Radchenko
Bastien  writes:

> A few suggestions:
>
> - Make it a description of the syntax of the latest stable Org.  (For
>   now let's consider 9.6 to be the latest stable as we are working on
>   releasing it soon.)  Perhaps this is already the case and I missed
>   it?

Yes, it should be consistent with the latest Org. (We tried our best to
make it so). We also kept some things a bit more generic for forward
compatibility.

> - Remove the "draft" status ("DRAFT v2β"). Don't describe it as a
>   draft in the org-manual.org if it accurately reflects the current
>   syntax (current = latest stable).

> - Remove all the inline notes (some suggest changes in Org's grammar,
>   that might scare the readers a bit.)

See https://orgmode.org/list/87ilj2l0cy.fsf@localhost

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at .
Support Org development at ,
or support my work at 



Re: Org Syntax Specification

2022-11-25 Thread Ihor Radchenko
Bastien  writes:

> - Promote the page to orgmode.org/worg/org-syntax.html: the /dev/ path
>   in the current URL makes it read like it is the syntax for the "dev"
>   version.

I think we need to use rewrite rule here in addition to moving the file.
If we simply move the file, old links will be broken.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at .
Support Org development at ,
or support my work at 



Re: Org Syntax Specification

2022-09-25 Thread Rohit Patnaik
I also want to chip in with a thank-you for the org syntax specification page. 
As someone who's working on a custom org exporter, this is a very useful 
resource for finding out how elements are structured within org-mode.

Thanks,
Rohit




Re: Org Syntax Specification

2022-09-25 Thread Bastien
Hi Timothy,

I'm late to the party, but *thanks* for these important improvements
on the https://orgmode.org/worg/dev/org-syntax.html page!

A few suggestions:

- Make it a description of the syntax of the latest stable Org.  (For
  now let's consider 9.6 to be the latest stable as we are working on
  releasing it soon.)  Perhaps this is already the case and I missed
  it?

- Remove the "draft" status ("DRAFT v2β"). Don't describe it as a
  draft in the org-manual.org if it accurately reflects the current
  syntax (current = latest stable).

- Remove all the inline notes (some suggest changes in Org's grammar,
  that might scare the readers a bit.)

- Promote the page to orgmode.org/worg/org-syntax.html: the /dev/ path
  in the current URL makes it read like it is the syntax for the "dev"
  version.

What do you think?

-- 
 Bastien



Re: Org Syntax Specification

2022-01-19 Thread Ihor Radchenko
Tom Gillespie  writes:

> 3. When I say grammar in this context I mean specifically an eBNF that
>generates a LALR(1) or LR(1) parser. This is narrower than the
>definition used in the document, which includes things that have to
>be implemented in the tokenizer, or in a pass after the grammar has
>been applied, or are related to some other aspect beyond the pure
>surface syntax.

I feel that we should not be trying to fit into LR at the expense of
complicating the document. When looking at earlier versions of the
grammar, I mostly had GLR in mind.

> In my thinking I separate the context sensitive nature of parsing from
> the nesting structure of the resulting sexpressions, org elements,
> etc.The most obvious example of this is that the sexpression
> representation for headings nests based on the level of the heading,
> but heading level cannot be determined by the grammar so it must be
> reconstructed from a flat sequence of headings that have varying level.

1. I think that results sexpression is important to describe. We
   eventually plan to provide a reference test set to verify external
   parsers against org-element.el [1]. It is important to describe the
   nesting with this consideration.

2. You actually can determine the end of heading if you are allowed to
   do lookaheads (which is anyway necessary to parse
   #+begin_blah..#+end_blah). The end of current heading is
   "eof|^\*{,N-current-heading} "

[2] https://list.orgmode.org/spmq6a$2s5$1...@ciao.gmane.io/T/#t

> ... I think the
> other issue I was having here is that the spec for tables is spread
> allover the place, and it would be much easier to understand and
> implement ifit were all in one place.

That sounds fine for me. Though your next suggestion appears to be
exactly opposite:

> I think your version is quite a bit more readable.  Can we list the
> set of all the elements that can be ended by a new lineas well as
> those that cannot (iirc they are elements such as footnotes that can
> only be ended by a double blank line or a heading)?

The intention behind listing the exceptions for table cells was exactly
as you thinking about open-ended elements. 

>> I am not sure here. Inline tasks are special because a one-line inline
>> task must not contain any text below, cannot have planning or
>> properties.
>
> Then they are no longer inline tasks, but instead parse as headings, correct?

They are still inline tasks. Consider the below example:

* Normal heading

Paragraph
** Inline task
SCHEDULED: <2022-01-19> <- this is an ordinary paragraph, not a part of inline 
task
Continuing "SCHEDULED" paragraph, not a part of inline task

* Next heading

The parsed sexp will be
(heading
  (paragraph)
  (inlinetask)
  (paragraph))
(heading)

>> If we mention this, we also need to elaborate kind of element is
>> #+todo:, where it can be located, and how to parse multiple instances of
>> #+todo in the document.
>
> Yes. What I have written for laundry is that only #+todo: declarations
> that appear in the zeroth section will be applied (this is true for
> all document level configuration keywords). There is also a
> possibility that we might be able to support including #+todo:
> keywords (and #+link: definitions or similar) in further sections, but
> that they would only apply to headings that occur after that line in
> the file. Such behavior is likely to be confusing to users so probably
> best to only guarantee correct behavior if they are put in the zeroth
> section.
>
> The reason it is confusing/problematic is that there could be
> a #+todo: buried half way down a file, the buffer configuration is
> updated, and then a user can use keywords up the file in the elisp
> implementation. Another implementation that parses a file
> incrementally would not encounter the buried #+todo: keyword until
> after they have already emitted a heading,changing how a heading is
> parsed. There is a similar issue with the #+link: keyword.

That's why it was initially not included into the syntax document. If we
fall into this rabbit hole, we also need to describe things like
CATEGORY, PROPERTY, OPTIONS, PRIORITIES, PROPERTY, SEQ_TODO, STARTUP,
TYP_TODO, etc.

>> > +All content following a heading that appears before the next heading
>> > +(regardless of the level of that next heading) is a section.
>>
>> Note that it is not true for one-line inline tasks.
>
> I'm not quite sure which part you are referring to here.

I only left the relevant part this time. Also, see the example above.
Inline task only consists of a single line. Nothing below is a part of
it.

> Let's look into how much work it will be and how disruptive it might
> be?  We are already changing to heading in the elisp so maybe now
> would be a good time to also change from section to segment?
> Alternatively we could start by updating the documentation and include
> a note that segments are currently called sections by org 

Re: Org Syntax Specification

2022-01-18 Thread Tom Gillespie
Hi Ihor,
  Thank you very much for the detailed responses. Let me start with
some context.

1. A number of the comments that I made fall into the brainstorming
   category, so they don't need to make their way into the document at
   this time. I agree that it is critical for this document to capture
   how org is parsed right now and that we should not put the
   pie-in-the-sky changes in until the behavior of org-element matches
   (if such a change is made at all).
2. Though I haven't been hacking on it, I fully intend to contribute
   test cases and exploratory work on org-element in the future, so
   please don't interpret some of what I am writing as requests for
   other people to write code (unless they want to :)
3. When I say grammar in this context I mean specifically an eBNF that
   generates a LALR(1) or LR(1) parser. This is narrower than the
   definition used in the document, which includes things that have to
   be implemented in the tokenizer, or in a pass after the grammar has
   been applied, or are related to some other aspect beyond the pure
   surface syntax.
4. A number of my comments are about the structure of the document
   more than the structure of the syntax or the implementation. I
   think that most of them are trying to ask whether we want to
   clearly delineate pure surface syntax from semantics to make the
   document easier to understand.

More replies in line.
Best!
Tom

> As for your other comments, you seem to be suggesting a number of
> changes to the existing Org syntax. Some of them looks fine, some are
> not. However, please keep in mind that we have to deal with back
> compatibility, third party compatibility, and not breaking existing Org
> documents unless we have a very strong justification. I suggest to
> branch a number of new threads from here for each concrete suggestion
> where you want to make changes to Org syntax, as opposed to just
> document wording. Otherwise, this discussion will become a total mess.

Agreed. I put many of these in here as notes from my experiences, I
will branch those off into separate discussions so that we don't
pollute this thread.

> Nope. Sections are actually elements. See =org-element-all-elements=.

I realized this at a slightly later date but missed cleaning up this
comment.  See my response on section vs segment below.

> I disagree. Nesting rules are the important part of syntax. We have
> restrictions on what elements can be inside other element. The same
> patterns are not recognised in Org depending on their nesting. For
> example, links that you put into property drawers are not considered
> link objects.

When I wrote this comment I was still confused about sections.I think
discussion of nesting in most contexts is ok, but there are some case
where nesting cannot be determined from the grammar, and there I think
we need to make a distinction.

In my thinking I separate the context sensitive nature of parsing from
the nesting structure of the resulting sexpressions, org elements,
etc.The most obvious example of this is that the sexpression
representation for headings nests based on the level of the heading,
but heading level cannot be determined by the grammar so it must be
reconstructed from a flat sequence of headings that have varying level.

> Again I disagree. While your idea about table cells is reasonable
> (similar for citation-references inside citations), I am against
> decoupling Org syntax from org-element implementation. In
> org-element.el, table-cells are just yet another object. If we make
> things in org-element and syntax document out of sync, confusion and
> errors will follow during future maintenance.

Org element treats all elements and objects as a single homogenous
type.  This is fine. However, to help people understand the syntax it
seems easier to define things in a positive way so that we don't say
"all except these two."  Therefore, despite the fact that the
implementation of org-element treats table rows and cells no different
from any other node in the parse tree, we don't need to burden the
reader with that information at this point in time, and could provide
that information as an implementation note for cells.  I think the
other issue I was having here is that the spec for tables is spread
allover the place, and it would be much easier to understand and
implement ifit were all in one place.

> This actually reads slightly confusing. "Blank lines separate paragraphs
> and other elements" sounds like blank lines are only relevant
> before/after paragraphs. However, there are also footnote references and
> lists. Maybe we can try something like:
>
> Blank lines can be used to indicate end of some elements.
>
> "can" because a single blank line usually does not separate anything.

I think your version is quite a bit more readable.  Can we list the
set of all the elements that can be ended by a new lineas well as
those that cannot (iirc they are elements such as footnotes that can
only 

Re: Org Syntax Specification

2022-01-18 Thread Ihor Radchenko
Tom Gillespie  writes:

> Extremely in favor of removing switches. There are so many better ways
> to do this now that aren't like some eldritch unix horror crawling up
> out of the abyss and into the eBNF :)

I also agree that switches and $$-style equations may be deprecated.
We can
1. Do not mention them in the document
2. Add org-lint warnings about obsoletion

As for your other comments, you seem to be suggesting a number of
changes to the existing Org syntax. Some of them looks fine, some are
not. However, please keep in mind that we have to deal with back
compatibility, third party compatibility, and not breaking existing Org
documents unless we have a very strong justification. I suggest to
branch a number of new threads from here for each concrete suggestion
where you want to make changes to Org syntax, as opposed to just
document wording. Otherwise, this discussion will become a total mess.

More details below.

> +Elements are further divided into "[[#Headings][headings]]", 
> "[[#Sections][sections]]"[fn::sections are not elements], 
> "[[#Greater_Elements][greater

Nope. Sections are actually elements. See =org-element-all-elements=.

> +other headings. [fn:tom2:I would not discuss strata here because it is
> +not related to the syntax of the document. It is related to how that
> +syntax is interpreted by org mode. The strata are nesting rules that
> +are independent of the syntax, and discussing that here in the syntax
> +document is confusing, because the nesting is not something that can be
> +parsed directly because it depends on the number of asterisks.]

I disagree. Nesting rules are the important part of syntax. We have
restrictions on what elements can be inside other element. The same
patterns are not recognised in Org depending on their nesting. For
example, links that you put into property drawers are not considered
link objects.
  
> +citation references and [[#Table_Cells][table cells]].[fn:tom3:Table cells 
> should
> +be treated in a way that is entirely separate from objects. This document 
> has included
> +them as such as has org-element (iirc) however since they can never appear 
> in a paragraph
> +and because tables are completely separate syntactically, we should probably 
> drop the
> +idea that table cells are objects. I realize that this might mean the 
> creation of a
> +distinction between paragraph-objects, title-objects, table-objects etc.]

Again I disagree. While your idea about table cells is reasonable
(similar for citation-references inside citations), I am against
decoupling Org syntax from org-element implementation. In
org-element.el, table-cells are just yet another object. If we make
things in org-element and syntax document out of sync, confusion and
errors will follow during future maintenance.
  
>  A line containing only spaces, tabs, newlines, and line feeds (=\t\n\r=)
> -is considered a /blank line/.  Blank lines can be used to separate
> +is considered a /blank line/.  Blank lines separate
>  paragraphs and other elements.

This actually reads slightly confusing. "Blank lines separate paragraphs
and other elements" sounds like blank lines are only relevant
before/after paragraphs. However, there are also footnote references and
lists. Maybe we can try something like:

Blank lines can be used to indicate end of some elements.

"can" because a single blank line usually does not separate anything.

> +considered part of the paragraph.[fn:tom4:I don't think we need to discuss
> +nesting scope here, it is confusing, it is always the immediately prior
> +(lesser?) element.]

Then where can we put it? This is one of the tricky conventions we use
in the parser.
  
> ++ STARS :: A string consisting of one or more asterisks[fn::removed
> +  note about inline tasks because it is still a heading, any mention
> +  of a concrete number should not appear in the specification of
> syntax.]

I am not sure here. Inline tasks are special because a one-line inline
task must not contain any text below, cannot have planning or
properties.

> +  contains =TODO= and =DONE=, however org-todo-keywords-1 is a buffer local
> +  variable and can be set by users in an org file using =#+todo:=.].

If we mention this, we also need to elaborate kind of element is
#+todo:, where it can be located, and how to parse multiple instances of
#+todo in the document.

> -A heading contains directly one section (optionally), followed by
> -any number of deeper level headings.
> +The level of a heading can be used to construct a nested structure.
> +All content following a heading that appears before the next heading
> +(regardless of the level of that next heading) is a section. In addition,
> +text before the first heading in an org document is also a section.

Note that it is not true for one-line inline tasks.

> +considered a section), sections only occur within headings.[fn:: The
> +choice to call this syntactic component a section is confusing because
> +it is at odds with the usual 

Re: Org Syntax Specification

2022-01-17 Thread Tom Gillespie
Hi Timothy,
I have attached a patch with some modifications and a bunch of
comments (as footnotes). More replies in line. Thank you for all your
work on this!
Tom

> Marking this as depreciated would have no effect on Org’s current behaviour, 
> but we could:
>
> Mark as depreciated now-ish
> Add a utility to convert from TeX-style to LaTeX-style
> Add org lint/fortification warnings
> A while later (half a decade? more?) actually remove support

In favor of this. There are good alternatives for this now.

> The other component of the syntax which feels particularly awkward to me is 
> source block switches. They seem a bit odd, and since arguments exist, 
> completely redundant.

Extremely in favor of removing switches. There are so many better ways
to do this now that aren't like some eldritch unix horror crawling up
out of the abyss and into the eBNF :)
From 3527331f02e593ec6ba6cb4c8bde3f64de3ad216 Mon Sep 17 00:00:00 2001
From: Tom Gillespie 
Date: Mon, 17 Jan 2022 19:34:21 -0500
Subject: [PATCH] Tom's comments and modifications to org syntax edited

I removed any mention of markdown because it is a distraction in this
document and is not something we want anyone attending to here.

I change "top level section" to "zeroth section" which I think is more
consistent terminology because level is often used to refer to the
depth of parsing at any given point in the file and the top level
refers to anything that can be parsed without context. Zeroth makes it
clear that we are talking about the actual zeroth occurrence of a
section in a file/buffer/stream.
---
 dev/org-syntax-edited.org | 399 +++---
 1 file changed, 331 insertions(+), 68 deletions(-)

diff --git a/dev/org-syntax-edited.org b/dev/org-syntax-edited.org
index c3259473..2e99070d 100644
--- a/dev/org-syntax-edited.org
+++ b/dev/org-syntax-edited.org
@@ -19,9 +19,7 @@ under the GNU General Public License v3 or later.
 Org is a plaintext format composed of simple, yet versatile, forms
 which represent formatting and structural information.  It is designed
 to be both intuitive to use, and capable of representing complex
-documents.  Like [[https://datatracker.ietf.org/doc/html/rfc7763][Markdown]], Org may be considered a lightweight markup
-language.  However, while Markdown refers to a collection of similar
-syntaxes, Org is a single syntax.
+documents.
 
 This document describes and comments on Org syntax as it is currently
 read by its parser (=org-element.el=) and, therefore, by the export
@@ -32,14 +30,13 @@ framework.
 ** Objects and Elements
 
 The components of this syntax can be divided into two classes:
-"[[#Objects][objects]]" and "[[#Elements][elements]]".  To better understand these classes,
-consider the paragraph as a unit of measurement.  /Elements/ are
-syntactic components that exist at the same or greater scope than a
-paragraph, i.e. which could not be contained by a paragraph.
-Conversely, /objects/ are syntactic components that exist with a smaller
-scope than a paragraph, and so can be contained within a paragraph.
-
-Elements can be stratified into "[[#Headings][headings]]", "[[#Sections][sections]]", "[[#Greater_Elements][greater
+"[[#Elements][elements]]" and "[[#Objects][objects]]".  Elements are
+syntactic components that have the same priority as or greater
+priority than a paragraph. Objects are syntactic components that are
+only recognized inside a paragraph or other paragraph-like elements
+such as heading titles.
+
+Elements are further divided into "[[#Headings][headings]]", "[[#Sections][sections]]"[fn::sections are not elements], "[[#Greater_Elements][greater
 elements]]", and "[[#Lesser_Elements][lesser elements]]", from broadest scope to
 narrowest.  Along with objects, these sub-classes define categories of
 syntactic environments.  Only [[#Headings][headings]], [[#Sections][sections]], [[#Property_Drawers][property drawers]], and
@@ -52,7 +49,12 @@ elements that cannot contain any other elements.  As such, a paragraph
 is considered a lesser element.  Greater elements can themselves
 contain greater elements or lesser elements. Sections contain both
 greater and lesser elements, and headings can contain a section and
-other headings.
+other headings. [fn:tom2:I would not discuss strata here because it is
+not related to the syntax of the document. It is related to how that
+syntax is interpreted by org mode. The strata are nesting rules that
+are independent of the syntax, and discussing that here in the syntax
+document is confusing, because the nesting is not something that can be
+parsed directly because it depends on the number of asterisks.]
 
 ** The minimal and standard sets of objects
 
@@ -60,25 +62,33 @@ To simplify references to common collections of objects, we define two
 useful sets.  The /<<>> of objects/ refers to [[#Plain_Text][plain text]], [[#Emphasis_Markers][text
 markup]], [[#Entities][entities]], [[#LaTeX_Fragments][LaTeX fragments]], 

Re: Org Syntax Specification

2022-01-15 Thread Sébastien Miquel

Hi,

The new document seems much clearer. It makes a nice complement to the
manual and we should definitely lose the (draft). Thank you Timothy
for the work.

Lastly, having spent a while looking at the syntax, I’m wondering if 
we should take this opportunity to mark some of the syntactic elements 
we’ve become less happy with as *(depreciated)*. I’m specifically 
thinking of the TeX-style LaTeX fragments which have been a bit of a 
pain. To quote Nicolas in org-syntax.org:


It would introduce incompatibilities with previous Org versions,
but support for |$...$| (and for symmetry, |$$...$$|) constructs
ought to be removed.

They are slow to parse, fragile, redundant and imply false
positives. — ngz



This quote has been mentioned a few times lately, and no one has yet
spoken in favor of the $…$ syntax, so I'll have a quick go.

It is easier to use, easier to read and more commonly used (and known)
in tex documents (a quick web search for sample tex documents confirms
the latter). Removing this syntax would make org slightly harder to
pick up, with respect to writing scientific documents.

As for the listed shortcomings, I don't think we know whether its
slowness is significant and false positives can be avoided by using
the \dollar entity (possibly ?). In my own use, the only usability
issue I can think of is false negatives while writing : inserting a
space or other such characters at the end of a snippet removes the
fontification (I solve this by modifying the fontification regexps).

Regards,

--
Sébastien Miquel