Thank you Mike,

Your suggestion works perfectly!

 

From: Mike Beckerle <mbecke...@apache.org> 
Sent: Sunday, August 24, 2025 3:05 PM
To: users@daffodil.apache.org
Subject: Re: Using UTF-8 in a regular expression?

 

No clue about "option setting". I think that's spurious.

 

I think the notation for the characters you want in an XSD pattern regex in XSD 
is "&#x0041;" same as in any XSD string literal. But the maximum number of hex 
digits is 6, because the highest codepoint value in the Unicode standard is 
U+10FFFF, so your "\u10FFFFFC" has too many hex digits.

 

But even correcting that, I am not sure you can get away with "&#x10FFFC;" just 
because that requires two 16-bit Java JVM characters - a high and a low 
surrogate, to express, and I don't know that the regex engine is going to 
support that. Hopefully it does.

 

So your regex should be "[&#x41;-&#x10FFFC;]*" 

 

Note that ChatGPT5 seems to think this will work with XercesJ validator (which 
is the one built into Daffodil for "full" validation). See 
https://chatgpt.com/share/68ab6236-cd84-800f-801f-a08977e6c4dc

 

 

 

On Fri, Aug 22, 2025 at 4:01 PM Mark Kozak <mark.ko...@adeptus-cs.com 
<mailto:mark.ko...@adeptus-cs.com> > wrote:

Hello folks,

 

I am suing version 3.10.

When using a pattern restriction as shown below:

 

            <element name="value" dfdl:terminator="%NL;">

                <simpleType>

                    <restriction base="xs:string">

                        <pattern value="[\u00000041-\u10FFFFFC]*"/>

                    </restriction>

                </simpleType>

            </element>

 

 

I get the following error:

 

[error] Schema Definition Error: Error loading schema due to InvalidRegex: 
Pattern value '[\u00000041-\u10FFFFFC]*' is not a valid regular expression. The 
reported error was: 'This expression is not supported in the current option 
setting.'.

 

What is the option setting I need to change to make that work?

 

 

Mark Kozak

Director of Engineering

Adeptus Cyber Solutions

Adeptus-CS.com

 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to