hi Richard,

Kontakt Richard Ostrochovský (<richard.ostrochov...@gmail.com>) kirjutas
kuupäeval E, 9. detsember 2019 kell 01:57:

> Hello colleagues,
>
> I was searching for the answer here:
> https://simple-evcorr.github.io/man.html
> https://sourceforge.net/p/simple-evcorr/mailman/simple-evcorr-users/
> and haven't found the answer, so I'am putting new question here:
>
> Does SEC in pattern= parameters support RegExp modifiers (
> https://perldoc.perl.org/perlre.html#Modifiers) somehow?
>

If you enclose a regular expression within /.../, SEC does not treat
slashes as separators but rather as parts of regular expression, therefore
you can't provide modifiers in the end of regular expression after /.
However, Perl regular expressions allow for modifiers to be provided with
(?<modifiers>) construct. For example, the following pattern matches the
string "test" in case insensitive way:
pattern=(?i)test
In addition, you can use such modifiers anywhere in regular expression
which makes them more flexible than modifiers after /. For example, the
following pattern matches strings "test" and "tesT":
pattern=tes(?i)t

In SEC FAQ, there is also a short discussion on this topic:
https://simple-evcorr.github.io/FAQ.html#13)


> E.g. modifiers /x or /xx allow writing more readable expressions by
> ignoring unescaped whitespaces (implies possible multi-line regular
> expressions). It could be practical in case of more complex expressions, to
> let them being typed more legibly. Some simpler example:
>
> pattern=/\
> ^\s*([A-Z]\s+)?\
> (?<data_source_timestamp>\
>    (\
>       ([\[\d\-\.\:\s\]]*[\d\]]) |\
>       (\
>          (Mon|Tue|Wed|Thu|Fri|Sat|Sun) \s+\
>          (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \s+ \d+ \s+
> \d\d:\d\d:\d\d \s+ ([A-Z]+\s+)?\d\d\d\d\
>       )\
>    )\
> ) (?<message>.*)/x
>
>
It is a tricky question, since SEC configuration file format allows to
natively provide regular expressions in multiple lines without the use of
(?x) modifier. If any line in rule definition ends with backslash, the
following line is appended to the current line and backslash is removed
during configuration file parsing. For example, the following two pattern
definitions are equivalent:

pattern=test: \
(\S+) \
(\S+)$

pattern=test: (\S+) (\S+)$

However, it is important to remember that SEC converts multi-line rule
fields into single-line format before any other processing, and that
includes compiling regular expressions. In other words, if you consider the
first multi-line regular expression pattern definition above, SEC actually
sees it as "test: (\S+) (\S+)$" when it compiles this expression. This
introduces the following caveat -- when using (?x) modifier for introducing
a comment into multi-line regular expression, the expression is converted
into single line format before expression is compiled and (?x) has any
effect, and therefore the comment will unexpectedly run until the end of
regular expression. Consider the following example:

pattern=(?x)test:\
# this is a comment \
(\S+)$

Internally, this definition is first converted to single line format:
pattern=(?x)test:# this is a comment (\S+)$
However, this means that without the comment the expression looks like
(?x)test:
which is not what we want.

To address this issue, you can limit the scope of comments with (?#...)
constructs that don't require (?x). For example:

pattern=test:\
(?# this is a comment )\
(\S+)$

During configuration file parsing this expression is converted into
"test:(?# this is a comment )(\S+)$", and after dropping the comment it
becomes "test:(\S+)$" as we expect.

Hope this helps,
risto
_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Reply via email to