Re: Inline markup: How does org identify nested code/verbatim?

2023-01-30 Thread Tom Gillespie
In short, you cannot nest code in verbatim or verbatim in code
because they are both terminal (end of the line for nesting).
In fact you can't nest anything inside them by their very nature.

Anything inside of them cannot have special functionality, even
escape codes don't play well in that part of the grammar.

There is no way around this because you cannot nest inside
things that are by definition terminal. However, from your
examples it seems that you can get the effect you are looking
for using ~is~ =verbatim= ~in code~.


Re: Inline markup: How does org identify nested code/verbatim?

2023-01-30 Thread Ihor Radchenko
 writes:

> The point is myself I'm able to identify code or verbatim with regex
> including three catch groups for the content before, between and
> after the inline markers.
>
> for verbatim: "(^|[ .,;:\-?!({\"'])=(.*?)=([ .,;:\-?!)}\"']|$)"
> for code: "(^|[ .,;:\-?!({\"'])~(.*?)~([ .,;:\-?!)}\"']|$)"
>  
> But they don't work together. In the example above I need to use the
> verbatim regex first to make it right. 

See https://orgmode.org/worg/org-syntax.html#Emphasis_Markers
Note that Org is not context-free. Within Org AST elements that can
contain objects, the first match "wins":
1. Org looks at a text and searches the first matching object regexp
2. Everything before the match is considered plain-text
3. Everything inside the match is considered the matched object and then
   parsed recursively
4. go to (1)   

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at .
Support Org development at ,
or support my work at 



Re: Inline markup: How does org identify nested code/verbatim?

2023-01-29 Thread Max Nikulin

On 30/01/2023 01:20, c.buhtz wrote:

Please let me add the nested-regex-approach.


You should look up for any markup starting at first. org-element parser 
uses "first wins" approach. Notice the following:


/italics ~code/ verbatim~

is exported as


italics ~code verbatim~

Notice that closing italics marker cancels recognizing of code snippet.




Re: Inline markup: How does org identify nested code/verbatim?

2023-01-29 Thread c.buhtz
Please let me add the nested-regex-approach. I wouldn't call this a
solution but just an approach. No one understand that regex it is
nearly unmaintainable.

I hope for a more elegant solution.

This matches if we have code in verbatim
^|[ .,;:\-?!({\"']=.*?(?:^|[ .,;:\-?!({\"']~.*?~[.,;:\-?!)}\"']|$).*?=[ 
.,;:\-?!)}\"']|$

This matches if we have verbatim in code
(?:^|[ .,;:\-?!({\"']~.*?(?:^|[ .,;:\-?!({\"']=.*?=[.,;:\-?!)}\"']|$).*?~[ 
.,;:\-?!)}\"']|$)

If one of this matching I now which one of my "usual" regex pattern using 
catching groups to extract the content I should use first.

Just for testing (maybe on regex101.com) here is the text I used.

This =is ~code~ in verbatim= text.
This =is usual verbatim= text.

This ~is =verbatim= in code~ text.
This ~is usual code~ text.



Inline markup: How does org identify nested code/verbatim?

2023-01-29 Thread c.buhtz
Hi folks,

this is a question about org(mode) development itself.
It is magic to me how you do this. ;) And I would like to learn it
because I do write kind of an org parser in Python.

Here is a nested code-in-verbatim text.

This =is ~code~ in verbatim= text.

Exporting this to html (via org-html-export-as-html)

This is ~code~ in verbatim text.

Awsome! :D

The point is myself I'm able to identify code or verbatim with regex
including three catch groups for the content before, between and
after the inline markers.

for verbatim: "(^|[ .,;:\-?!({\"'])=(.*?)=([ .,;:\-?!)}\"']|$)"
for code: "(^|[ .,;:\-?!({\"'])~(.*?)~([ .,;:\-?!)}\"']|$)"
 
But they don't work together. In the example above I need to use the
verbatim regex first to make it right. 

If I would use the code regex first it wouldn't work because it would
find the ~code~ but without knowing that it is surrounded by ~verbatim~.

I don't know what my users inputs to my software: verbatim in code or
code in verbatim. So I have to figure out which regex to use first.

How does org solve this problem? I don't need a full working solution
but just an idea.

One approach in my mind is to run both regex separate and then compare
the results "somehow":

Verbatim: ['This', ' ', 'is ~code~ in verbatim', ' ', 'text.']
Code :['This =is', ' ', 'code', ' ', 'in verbatim= text.']

"Somehow"!

Another approach in my mind is to do something I would call nested
regex. Constructing a regex pattern looking for verbatim with code in
it. And the other way around of course.