Re: Nested RegEx

2012-04-24 Thread nrgyzer
== Auszug aus Dmitry Olshansky (dmitry.o...@gmail.com)'s Artikel
> On 21.04.2012 22:46, H. S. Teoh wrote:
> > On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
> >> On 21.04.2012 21:24, nrgyzer wrote:
> >>> Hi guys,
> >>>
> >>> I'm trying to use std.regex to parse a string like the following:
> >>>
> >>> string myString = "preOuter {if condition1} content1 {if condition2}
content2
> >>> {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
> >>>
> >>
> >> Simply put pure regex is incapable of arbitrary-nested if-else
> >> statements. In a sense if... /if is a case of balanced parens
> >> problem and it's widely know to form non-regular language.
> >
> > This is of theoretical interest to me. Very often I find myself wanting
> > a concise way to express patterns with nested matching delimiters, but
> > regexes can't do it. But to jump to a full-blown stack-based language
> > seems like an overkill to me: stack languages can express *much* more
> > than just nested delimiters, most of which is difficult to encapsulate
> > in a nice, concise syntax like regexes. All I want is a simple way to
> > express the kind of simple nesting that matching delimiters give.
> >
> Recursive descent is not particularly bad unless minimal "grammar
> descent depth" is high. Example:
> a+b-c
> uses a quite a lot of recursive calls for grammar with e.g. 10+ operator
> precedence levels.
> I'm thinking of merging operator precedence parser with regex might be a
> happy marriage you know.
> Back to OP topic something along the lines of this will do it (beware of
> stack overflow):
> void parseIf(){
>   static int ifNest;
>   if(input.startWith("{if")){
>   ifNest++;
>   scope(exit) ifNest--;
>   enforce(ifNest < 1, "conservative stack overflow");
>   parseCond(input[2..$-1]);//regex that does condition
>   enforce(input[$-1] == '}', "close that if");
>   parseIf();//parse body or another nested if
>   //parseElse();//goes here, as does elif
>   }
>   else
>   parseBody();// the regex you used before
> }
> >
> > [...]
> >> One day I may add push-pop stack extensions (that allow this kind of
> >> thing) into regex but I'd have to think hard to make it efficient.
> > [...]
> >
> > Plus, have a nice concise syntax for it (otherwise I might as well just
> > write a recursive descent parser for it in the first place).
> >
> Concise syntax & lots of power is a priority actually, because I can
> detect if push-pop stuff is used or not and pick the right engine for
> the job. So different big-oh complexity is not important.

Thanks guys... I just solved the problem by using indexOf() from std.string for
parsing my if-statements. So... in principle, I can have unlimited nested if-
statements, but I think it's much more difficult (and slower) than using any
regex-expression.


Re: Nested RegEx

2012-04-21 Thread Dmitry Olshansky

On 21.04.2012 22:46, H. S. Teoh wrote:

On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:

On 21.04.2012 21:24, nrgyzer wrote:

Hi guys,

I'm trying to use std.regex to parse a string like the following:

string myString = "preOuter {if condition1} content1 {if condition2} content2
{elseif condition3} content3 {else}any other content{/if}{/if} postOuter";



Simply put pure regex is incapable of arbitrary-nested if-else
statements. In a sense if... /if is a case of balanced parens
problem and it's widely know to form non-regular language.


This is of theoretical interest to me. Very often I find myself wanting
a concise way to express patterns with nested matching delimiters, but
regexes can't do it. But to jump to a full-blown stack-based language
seems like an overkill to me: stack languages can express *much* more
than just nested delimiters, most of which is difficult to encapsulate
in a nice, concise syntax like regexes. All I want is a simple way to
express the kind of simple nesting that matching delimiters give.



Recursive descent is not particularly bad unless minimal "grammar 
descent depth" is high. Example:

a+b-c
uses a quite a lot of recursive calls for grammar with e.g. 10+ operator 
precedence levels.



I'm thinking of merging operator precedence parser with regex might be a 
happy marriage you know.


Back to OP topic something along the lines of this will do it (beware of 
stack overflow):


void parseIf(){
static int ifNest;
if(input.startWith("{if")){
ifNest++;
scope(exit) ifNest--;
enforce(ifNest < 1, "conservative stack overflow");
parseCond(input[2..$-1]);//regex that does condition
enforce(input[$-1] == '}', "close that if");
parseIf();//parse body or another nested if
//parseElse();//goes here, as does elif
}
else
parseBody();// the regex you used before
}




[...]

One day I may add push-pop stack extensions (that allow this kind of
thing) into regex but I'd have to think hard to make it efficient.

[...]

Plus, have a nice concise syntax for it (otherwise I might as well just
write a recursive descent parser for it in the first place).



Concise syntax & lots of power is a priority actually, because I can 
detect if push-pop stuff is used or not and pick the right engine for 
the job. So different big-oh complexity is not important.


--
Dmitry Olshansky


Re: Nested RegEx

2012-04-21 Thread H. S. Teoh
On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
> On 21.04.2012 21:24, nrgyzer wrote:
> >Hi guys,
> >
> >I'm trying to use std.regex to parse a string like the following:
> >
> >string myString = "preOuter {if condition1} content1 {if condition2} content2
> >{elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
> >
> 
> Simply put pure regex is incapable of arbitrary-nested if-else
> statements. In a sense if... /if is a case of balanced parens
> problem and it's widely know to form non-regular language.

This is of theoretical interest to me. Very often I find myself wanting
a concise way to express patterns with nested matching delimiters, but
regexes can't do it. But to jump to a full-blown stack-based language
seems like an overkill to me: stack languages can express *much* more
than just nested delimiters, most of which is difficult to encapsulate
in a nice, concise syntax like regexes. All I want is a simple way to
express the kind of simple nesting that matching delimiters give.


[...]
> One day I may add push-pop stack extensions (that allow this kind of
> thing) into regex but I'd have to think hard to make it efficient.
[...]

Plus, have a nice concise syntax for it (otherwise I might as well just
write a recursive descent parser for it in the first place).


T

-- 
The day Microsoft makes something that doesn't suck is probably the day
they start making vacuum cleaners... -- Slashdotter


Re: Nested RegEx

2012-04-21 Thread Dmitry Olshansky

On 21.04.2012 21:24, nrgyzer wrote:

Hi guys,

I'm trying to use std.regex to parse a string like the following:

string myString = "preOuter {if condition1} content1 {if condition2} content2
{elseif condition3} content3 {else}any other content{/if}{/if} postOuter";



Simply put pure regex is incapable of arbitrary-nested if-else 
statements. In a sense if... /if is a case of balanced parens problem 
and it's widely know to form non-regular language.


The way out is to either preconstruct regex pattern for a given maximum 
depth or just use handwritten parser (it may use regex for parts of 
input internally).


One day I may add push-pop stack extensions (that allow this kind of 
thing) into regex but I'd have to think hard to make it efficient.


(look at .NET regex they kind of have syntax for this but it's too 
underpowered for my taste)



Is there any chance to use std.regex to parse the string above? I currently
used the following expression:

auto r = regex(`(.*?)\{if:(?P(.+?))\}(?P(.*))(\{/if\})
(.*)`, "g");

but it doesn't fit my nested if- and else-statements correctly.

Thanks in advance for any suggestions!



--
Dmitry Olshansky


Nested RegEx

2012-04-21 Thread nrgyzer
Hi guys,

I'm trying to use std.regex to parse a string like the following:

string myString = "preOuter {if condition1} content1 {if condition2} content2
{elseif condition3} content3 {else}any other content{/if}{/if} postOuter";

Is there any chance to use std.regex to parse the string above? I currently
used the following expression:

auto r = regex(`(.*?)\{if:(?P(.+?))\}(?P(.*))(\{/if\})
(.*)`, "g");

but it doesn't fit my nested if- and else-statements correctly.

Thanks in advance for any suggestions!