Re: Does the dollar sign mean "end of file"?

Steve Lawrence Tue, 27 Aug 2019 13:52:49 -0700

(?s) is a special regex flag that changes the behavior of regex matching
so that dot matches newlines. It itself doesn't match anything or do
anything else. Any it doesn't need to be immediately before a dot. The
(?s) just needs to appear somewhere before the dot is used to enable the
flag. That will enable the feature for all dots appearing after it in
the regex. You can also disable the flag it by using (?-s).


So for example, this regex:

  (?s)foo...(?-s)bar...

Would match "foo" followed by three of any character (including
newlines), followed by "bar" and three of any character (excluding new
lines).

There's a handful of other flags for things like case insensitivity
(?i), comments (?c), and multiline (?m) too. The Java doc talks about
these different flags and what how they change the behavior of a regex.

https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

 Steve


On 8/27/19 2:57 PM, Costello, Roger L. wrote:
> Hi Steve,
> 
> What is (?s)
> 
> Can it be placed before something other than a period symbol? What does it do 
> in that case?
> 
> When it is placed before a period symbol, it means "Please consume any 
> character, including newlines" ... is that right?
> 
> /Roger
> 
> -----Original Message-----
> From: Steve Lawrence <[email protected]> 
> Sent: Tuesday, August 27, 2019 1:16 PM
> To: [email protected]
> Subject: [EXT] Re: Does the dollar sign mean "end of file"?
> 
> The issue here is that the dot character doesn't match newlines. So your 
> expression is essentially just looking for one or more non-newline characters 
> up until the end of the data Your field has a newline, so the regular 
> expression fails there, doesn't match, and results in a zero length string.
> 
> If you want dot to match a newline, you can put the "(?s)" flag before the 
> regex.
> 
> You can also simplify the expression a bit. You don't need to make the dot 
> match non-greedy, and the $ doesn't need to be in a forward lookahead. So the 
> following should work and is a bit more compact:
> 
>  dfdl:lengthPattern="(?s).+$"
> 
> That will match one or more characters (including newlines) up until the end 
> of the data.
> 
> 
> 
> On 8/27/19 12:43 PM, Costello, Roger L. wrote:
>> Hello DFDL community,
>>
>> My input is this:
>>
>> Hello, World Blah
>> Broccoli
>> 3ABC
>>
>> I want it parsed to this:
>>
>> <input>
>> <A>Hello, World</A>
>> <B>Blah</B>
>> <C>Broccoli
>> 3ABC</C>
>> </input>
>>
>> That is, the first field is exactly 12 characters. The second field 
>> extends up to the newline. The third field is the rest.
>>
>> Below is my DFDL schema. It produces this result:
>>
>> <input>
>> <A>Hello, World</A>
>> <B>Blah</B>
>> <C></C>
>> </input>
>>
>> along with a warning message saying that a bunch of bytes remain.
>>
>> Why do I get that result instead of the desired result?  /Roger
>>
>> <xs:elementname="input">
>> <xs:complexType>
>> <xs:sequence>
>> <xs:elementname="A"type="xs:string"
>>                          dfdl:lengthKind="explicit"
>>                          dfdl:length="12"
>>                          dfdl:lengthUnits="characters"/> 
>> <xs:elementname="B"type="xs:string"
>>                          dfdl:lengthKind="delimited"
>>                          dfdl:terminator="%NL;"/> 
>> <xs:elementname="C"type="xs:string"
>>                          dfdl:lengthKind="pattern"
>>                          dfdl:lengthPattern=".+?(?=$)"/> 
>> </xs:sequence> </xs:complexType> </xs:element>
>>
>

Re: Does the dollar sign mean "end of file"?

Reply via email to