At 4:11 PM +0200 on 6/24/99, M. Uli Kusterer wrote:

>>      find "hello" & a string matched & "world" in "hello, hello world
>>world!"
>>Is it ", hello " or ", hello world"?
>
> You mean what "matched" would contain? Good question. I think it should be
>",hello ", as "world" is specified as the other criterium after the "&".

But there are two different ways to match world. One matches the entire
string, one does not; which is better?

In most regexp packages, a maximal match is used. And experiance hath shewn
that neither works all the time.

>Of
>course, if you did the find in "hello world", "matched" would contain a
>single space.
>
>>Must a match match the entire "in ..." part? Or should substring amtches be
>>allowed? Does "a string" eat as few or as many characters as possible? Is
>>backtracking required?
>
> "a string" would eat as few characters as necessary to fulfill the
>criteria. For "hello" & a string & "world" it would eat as many characters
>as are between the first "hello" in the string and the first "world" after
>that. The key here is finding the first match. Any subsequent ones need a
>chunk expression or some other means to make the "find" command ignore the
>start of the sctring passed.

But what about when you need a match that eats as much as possible?

>
>>It would probably add--ummm--noticable complexity to Interpreter, but most
>>of the code should be availible in a regexp library somewhere. I could
>>probably swipe it from Perl, for example. Main problem would be translation
>>to a regexp.
>
> Shouldn't be too hard to make a regexp out of this, I guess. It's like
>parsing a string, where "a string" would resolve to a wildcard character
>"*". You'd just have to get it to be immediately seen as an expression, it
>shouldn't be parsed as a string.

I think the best syntax would be somthing like this:

        find expression "hello" & [a] [minimal | maximal] string [<container>]\
        in [<search_container>]

"Find expression" so we don't have to implement infinite-lookahead parsers,
minimal vs. maximal allows a greedy (i.e., eat up all characters possible)
vs. non-greedy match, and if search_container is not provided it searches
the stack (like all other find commands). Also, if <container> is provided,
the matched text is stored to <container>. Maturally, the sequence does not
have a length lmimit, although this syntax description shows one.

Reply via email to